THE ANNALS 
| of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Choice of One Among Several Statistical Hypotheses. Raupn J. 
BROOKNER 


A Two-Sample Test for a Linear Hypothesis Whose Power is Inde- 
pendent of the Variance. CHARLES STEIN 


Compact Computation of the Inverse of a Matrix. Frepgrick V. 
WavuGH AND Paut §. DwYrr 


Multiple Matching and Runs by the Symbolic Method. Irvine 
KAPLANSKY AND JOHN RIORDAN 


On the Power Functions for the Z?-Test and the T?-Test. 


Some Generalizations of the Tuas of Cumulative Sums of Ran- 
dom Variables. ABRAHAM WALD 


On the Design of Experiments for Weighing and Making Other 
Types of Measurements, K. KisHEen 


Notes: 


Note on the Law of Large Numbers and “‘Fair’? Games. W. Fertgr.. 301 


A Note on Rank, Multicollinearity and Multiple Regression. Gur- 
HARD Towra 304 


RR 


Vol. XVI, No. 3 — September, 1945 


eee 





aa THE ANNALS 


Library 


HA OF MATHEMATICAL STATISTICS 


i 
~~ 
‘ a L EDITED BY 


lop: lL 8. 8. WILKS, Editor 


C. C. CRAIG W. FELLER J. NEYMAN 
ALLEN T. CRAIG THORNTON C. FRY WALTER A. SHEWHART 
W. EDWARDS DEMING HAROLD HOTELLING A. WALD 


WITH THE COOPERATION OF 


Wiiu1am G. CocHRran Pau. 8S. Dwyer Wititram G. Mapow 
J. H. Curtiss CHURCHILL EISENHART ALEXANDER M. Moop 
J. F. Daty Paut R. Hatmos Henry Scuerrf 
Harowp F. Dopcs Paut G. Hori Jacosp Wo.row!tTz 


The Annats oF MatuematicaL Sratistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, _ 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, P.S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 

Changes in mailing address which are to become effective for a given 
issue should be reported to. the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. Because of war-time difficulties of publica- 
tion, issues may often be from two to four weeks late in appearing. 
Subscribers are therefore requested to watt at least 30 days after month of issue 
before making inquiries concerning non-delivery. 

Manuscripts for publication in the ANNALS oF MATHEMATICAL STATISTICS © 
should be sent to 8S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts ~ 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bautrmore, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 











CHOICE OF ONE AMONG SEVERAL STATISTICAL HYPOTHESES 


By Rautren J. BRookner! 
New York City 


1. Introduction. Statistical decision is a term which we will apply to that 
phase of statistical inference which deals with the following question. Con- 
sider one or several variates: whose distribution function depends on one or 
several unknown parameters; suppose there be given a finite number of mutually 
exclusive hypotheses regarding the parameters, whose totality completely ex- 
hausts every possibility. If a sample of observations on the variates is made, 
the choice of one of the given hypotheses on the basis of that sample is called a 
statistical decision. In other words, to make a statistical decision is to give a 
procedure which will divide the sample space into as many regions as déhere are 
given hypotheses, and to set up a one-to-one correspondence between these 
regions and the hypotheses so that if the sample point lies in any particular 
region, the corresponding hypothesis is chosen. 

This notion is quite closely connected with both of the fields of statistical 
inference that have engaged most of the modern statistical theorists. On the one 
hand, it may be considered a generalization of the notion of testing hypotheses, 
for in this theory, one gives a procedure which divides the sample space into a 
region of rejection and a region of non-rejection of a given null hypothesis. 
Then one makes either of two decisions depending upon which of the regions 
contains the sample point. On the other hand, the theory of estimation is a 
generalization of the notion of statistical decision in which the number of alterna- 
tives is not restricted to be finite 

As in any phase of statistical inference, our primary aim is to define broad 
principles upon which ‘‘good” or “‘best’’ procedures for making statistical deci- 
sions may be based. The general problem of statistical decisions has been formu- 
lated by A. Wald, who has also proposed a principle on which the solution can 
be based. We are interested, however, in several of the simpler but important 
particular problems in which quite serious calculation difficulties are encountered 
in actually finding Wald’s solution. Hence, we will propose in its stead another 
principle which quite closely resembles Wald’s for selecting a solution of the 
problem of statistical decision. 

It may be pointed out, immediately that, from a purely logical point of view, 
the substitute principle we shall offer will probably be considered to be less 
acceptable than its predecessor. We will find, however, by considering its 
application to some of the well known problems of testing hypotheses, that the 
principle is at least. reasonable in leading to certain well accepted results. 


1 Research under a grant-in-aid of the Carnegie Corporation of New York. 
221 





222 RALPH J. BROOKNER 


2. Principle determining the “‘best” procedure. We will first discuss briefly 
Wald’s principle and the definition of the criterion that we will employ will be 
accomplished by pointing out the differences. A much more general formula- 
tion is possible [1], [2], but we will discuss the principle as it will be directly 
applied to the problems of statistical decisions when the number of hypotheses 
is finite 

Consider the variates +, 22, +--+ ,2, Whose probability density function 
F(a, X2, °° 5 Xp| 01, 02, +++, O&) is known except for the unknown values of 
the parameters @,, 62,-°-+,6.. We denote by @ a point in k-dimensional 
space whose coordinates are (6;, 62, --- , 6) and shall speak of this parameter 
space as 2. Suppose that w is any subset of 2 and that S represents a system 
of finitely many such sets which are mutually disjunct and which cover ©. 
Each element, wo , of S corresponds to a hypothesis H.,, , which is the hypothesis 
that @ is a point of wo , and the system of all such hypotheses corresponding to S 
we denote by Hs. 

A sample of N observations on 21 , 2, +++ , 2» is drawn and the sample may be 
considered as a point, EH, in the pN dimensional sample space; denote the sample 
space by M. We want to decide on the basis of the point E which of the hy- 
potheses of Hs should be accepted. That is, we seek a procedure by which the 
sample space may be divided into a system of mutually exclusive regions M,, 
which are the same in number as the number of elements of S, and by which a 
correspondence is set up so that the falling of the sample point into a particular 
M., shall cause us to accept a particular hypothesis H., as the true one. If 
the totality of regions M., be denoted Mz, it is necessary to give a principle by 
which we may prefer a particular system M5 over any other system Ms. 

Wald introduces the notion of a weight function of errors, a function of the 
parameters and of the decision made, which might well be defined as the loss 
incurred if 6 be the true parameter point and the sample point falls in M,, which 
causes us to accept the hypothesis H,,. Denote the weight function by W(8, wx) 
where wz stands for that hypothesis which we choose if E is the sample point; 
then we require that W(6, wz) be non-negative, and if @ lies in wz, W(8, wx) = 0 
for then the correct decision has been made and there is no loss. 

Perhaps the notion of a weight function can be most clearly understood, and 
its importance appreciated, if we consider the place of statistics in the business 
world, where possible losses are often computable in terms of money. The 
weight function may be taken to be equal to this loss. Suppose a manufacturing 
plant has a process which manufactures a product whose efficiency is a measurable 
quantity that we will denote by x. Suppose z is a random variable whose distri- 
bution depends only upon its mean value 6, and the company contemplates 
renewing its machinery if the mean value of the efficiency falls short significantly 
from a particular value 6). Then on the basis of a sample of N observations on 
x, one of two decisions must be reached: the rejection of the hypothesis 6 = 4, 
(the decision to renew the machinery), or the non-rejection of 6 = 4 (the decision 
not to renew it). Suppose the region M, is the region of the sample space such 





ee ee 


eR 





; 
5 
' 


eT OTT 


EE 





STATISTICAL HYPOTHESES 223 


that if F falls into M.,, we reject 6 = 0 and M;j is the complementary region. 
Then we may say that the weight function can be defined by 


W(0@, a) = 0 fcr 6 = 
W(@, @) = g(@) for 0 < & 
W(0, w) = 0 for 0 < % 
W(6, w) = h(@) for 0 => 4% 


where h(@) is the company’s monetary loss in needlessly changing its machinery 
and g(@) is a function which expresses the company’s loss in not changing its 
process even though the true value of the parameter is 6 < 6). The function 
g(@) may be of almost any form, but it is only reasonable that it should be a 
monotonic non-decreasing function of |6 — 6|, since the loss should, it seems, 
increase as the true value of @ is farther from 6. 

Wald then defines the risk as the expected value of the loss; since @ is an un- 
known, the risk will be a function of @, and it will also be a function of the system 
Ms: 


r(0, Ms) = [ W(6, wz)-f(E | 6) dE. 


According to Wald, the ‘“‘best”’ system of regions, Ms , is that system for which 
the maximum of the risk function with respect to the parameter @ is a minimum 
with respect to all possible systems, M; , of regions. Several important proper- 
ties are enjoyed by the system of regions defined in this way, though other 
reasonable definitions are possible. Perhaps the criterion of minimizing an 
average with respect to 6 of r(@, Ms) rather than the maximum may be con- 
sidered more plausible, but such definitions would raise the question of which 
average should be used, and the result obtained by using any particular average 
would not be invariant with respect to transformations of the parameter space. 

Using the notations as introduced above, and introducing the notation W (8, w;) 
to be the weight function if the ith hypothesis is chosen, the principle which 
we will use to solve some of the problems of statistical decisions can be given as 
follows: In place of the risk function, we consider the s functions 


R60, E) = W(6, w:)-f(E | 6) (¢ = 1, 2,---, 8) 


where f(E | 6) is a notation for the probability density, and s is the number of 
given hypotheses. If we denote by R;(E) the least upper bound of R,(6, F) 
with respect to 6, then we choose the system of “‘best”’ regions of acceptance by 
including each sample point E in a region M; determined such that for all Ey in 
M;, R(Eo) S R,(Eo) for allj # i. 

It is interesting to note that a rather general case exists in which the principle 
is exactly equivalent with the test of a hypothesis based upon the likelihood 
ratio principle. Consider the distribution function f(m1, %2, °°: ,%p| 61, 42, 
--+ | 6) which is a bounded function of the x’s and 6’s. Suppose we are in- 
terested in the test of the hypothesis (@;, 02, --- , 0.) € w where w is a closed 








224 RALPH J. BROOKNER 


set of points of the parameter space which does not contain any open subset of 
the parameter space. Furthermore assume that for each set of x’s the distribu- 
tion function is continuous in 6, --- , 6, on an open subset of 2 containing w. 
We will show that the principle will lead to the test based on the likelihood 
ratio if the following is the weight function: 
I. If w is accepted, the loss is zero if the true parameter point is in w, and the 
loss is a constant c, if the true parameter point is not in w 
II. If w is rejected (i.e. @ is chosen), the loss is zero if oo true parameter 
point is in @ and is a constant ¢ if the true parameter point is in w. 
Consider then the region of the sample space for which w is rejected according 
to the principle. This region is that for which 


lu.b. w.r.t. 0in w of [ef(x | @)] < lub. w.r.t. @in & of [e.f(x | )| 


where we have set f(x | 0) = f(a, 22, -+* , &p| 01, 02, +--+ , 0), and where lL.u.b. 
w.7.t. means “least upper bound with respect to.’ But the left-hand member 
of this inequality is equal to 


elu.b. w.r.t. Oin w of f(x @)| 


and because of the restriction on w and the continuity of f, we can see that the 
lu.b. of f(x | 6) with respect to all 6 in @ must coincide with the |.u.b. of the 
function with respect to all @ in Q, which is the total parameter space. Thus 
we have that the hypothesis w is rejected when 


ef{l.u.b. w.r.t. Ain w of f(a | @)] < a [l.u.b. w.r.t. din Q of f(x | 4)j 
or when 


lub. w.r.t. @ in w of f(x | @) i C1 
lu.b. w.r.t. @ in Q of f(z! 0) ~ e 

The left hand member of this inequality is the likelihood ratio statistic intro- 
duced by Neyman and Pearson [3]; hence our test is exactly equivalent with the 
likelihood ratio test where the size of the critical region is determined by ¢ 
and ¢». 

We pose the following quite hypothetical example to show circumstances 
under which the principle proposed is reasonable. The principle does not 
exactly apply as it was stated in terms of probability densities and the example 
involves discrete probabilities, but the logic seems somewhat applicable. Sup- 
pose a game is played which consists of the player’s guessing the number of white 
balls in an urn known to contain 10 balls, each of which is either white or black, 
on the basis of a sample of four drawings with replacements from the urn. Let 
us assume that there are eleven mutually exclusive hypotheses (as to the number 
of white balls in the urn) to choose among, and the player must make a choice 
of one of them after observing the drawing which can give 16 different results. 
Assume that the one who plays the game pays a banker a varying sum of money 
if he makes a wrong decision and that the banker has the privilege of choosing 





: 
’ 








STATISTICAL HYPOTHESES 225 


the population (i.e. the number of white and black balls originally in the urn). 
Now on the basis of the assumption that the banker knows the player’s decision 
function and will attempt to fix the population so as to make the player’s ex- 
pected loss 2 maximum, it is clear that Wald’s principle, which minimizes the 
maximum loss, leads to the best way to play the game. 

Now suppose that instead of one player making the choice among the deci- 
sions, we have 16 players participating in the game and the first player is to 
make the choice if, and only if, the drawing is WWWW’, the second player if the 
drawing is WWW’B, and so on, where W stands for the drawing of « white ball 
and B for the drawing of a black one. In this case, if player .c assumes that the 
banker will try to choose the population most unfavorable to him, then his 
decision function based on the new principle is the best method of play. 

Although the example indicates that in the usual case which would come up 
in practice, Wald’s principle would lead to the better procedure, since the 
statistician is usually faced with the necessity of giving a decision no matter 
what the sample point is, the new principle is useful since one may hope that in 
many practical cases the two principles will not lead to widely varying results, 
especially if the sample is large. 


3. Application of the criterion to the case of testing the mean of a normal 
distribution. Now we will show that the criterion will lead to the widely used 
test of “Student’s hypothesis.”? Suppose x is known to be distributed normally 
with unknown mean » and unknown variance ¢. On the basis of a sample of N 
independent observations 2), 22, -°-- , ty, “Student’s ¢’ is used to test the hy- 
pothesis 1» = 0. If # is the arithmetic mean of the N observations and s° the 
usual sample estimate of the variance, then with ¢ = +/N&/s, the hypothesis 
is to be rejected if ¢| = é where fo is a critical value at some chosen level of 
significance a obtained from the distribution of { under the null hypothesis. We 
will use the notation w, for the set of points » # 0 and w» for the set of points 
p = @. 

We will consider the problem in reference to the particular weight function 
detined as follows: 

W(u, 0; wo) = (u/o)* for u + 0 
WO, ¢501) = W 
W(u, ¢; 01) = 0 for u #0 


W(O, 6; w) = 0 


where as a matter of convenience, we will take k an even positive integer in order 
to avoid the introduction of the absolute value of u/o which is necessary if k 
is an odd integer. We also take k < N. 

The density function of the sample of NV observations is 
c 


N 
o 


; 7 (120?) 8@a—w)? 













H 
) 
i 
' 
} 
| 
i 





226 


RALPH J. BROOKNER 


where C is a constant. Then the two functions R;(@, E) are 


wc 2 . 
R,x(6, E) _ N »@ Ml2e?) Sag if p i 


Ri(6, E) = 0 ifn <0 
ii Cu (1/202) 8 (z—q—p)? ics / 
R2(0, BE) = az °° ifn ~0 
a 
R2(8, E) = 0 if uw = 0. 


To maximize R,(6, E), we set 


_NwWw 7,2 se 
aR (6, E) -[ NW ie (ined se2 9 


Oa aN +1 oNt3 


which gives 


hence 


” Sx 
N 


oy _ CW 
Ry = (Sa, )* a 


To maximize R2(6, FL), we set 


and 


y 7 s is 
dR2(6, E) = : 4. S S(te — “| Cur se lt) S(za-w)? gy 
= 


oNtk 


Ou 





AR, E) ce | -¥ mk + S(2a _ Cy e (1/2¢2) 8(zq—p)? 


o oNtk+i 


Oc 


which give the two relations 


and 


Then 


or 


9 


os = —_ S(te — pb) 


v 


2 


o 


1 ~Y 2 
a N+k S(Xa u) e 
—p(N + k)S(te — pw) = kS(a@e - u)” 


we — pt(l1 — k/N) — (k/N*)Sx, = 0 


which gives the maximizing value of 


ut — EL — B/N) & Va(L — k/N)* + (AK /N*)Sx, 
2 


= 0 





eR EOE REE Sy TE EY ORIN PTT EK 


r 
5 
k 
b 
; 
F 





Ne ORE PEI 





TEE NT 


ee 


PS STG TT IE TT 











STATISTICAL HYPOTHESES 227 


and it can easily be shown that the maximum is reached for the value of u* 
using the + sign when Z is positive and the — sign when Z is negative. We will 
carry through the case 7 > 0 only as the case ¢ < 0 follows in a similar manner. 
We have 

(u*)* KAt+h 


siesta ice aaa ae aaatae —i(N+k) 
TLL = “ib, °e * 
(u*)** | — S(2x, i pt) +k) 

To find the region of the sample space for which we should accept the hy- 
pothesis p ~ 0 (ie. the critical region for rejection of the hypothesis 1 = 0), we 
seek those points for which R,(EZ) S R,(E), i.e. those for which 

WN? (u*)* khvte) ‘i 
6 
(Sra) = (u*)O"[—S(tq — weyPO © 


R.(E) a 


or for which 





(Sam) - 


where c is a positive constant. Since both sides of the inequality are positive, 
this inequality is equivalent to 


Cu") (Cu — 2) 


(1 2\N =o 
. (s22)" 
where ¢; is another positive constant. 
Now we consider the statistic 
‘ t? NZ N 


~N—-1 Se. —N#” Sx2/#—N 
from which we have 
Sx,/#= (N/T’) +N. 
Also note that 
Q(ut/2) = (1 — k/N) + VG = BNF G/N) CEB) 
(and this is true whether 7 is positive or negative). Now we can write the criti- 
cal region (1) as 
(u*/E)—*(u*/% — 1)%** 
(Sa? /#*)" — 
or 
[| — k/N + VQ — k/N)? + (4K/N) (1 + 1/72) ]* “Uh + 1/77 * 

-[-1 —k/N + JS — k/N)? + (4k/N)0 + 1/T)]*™ S 
where c2 is another positive constant. We denote the left side of this inequality 
by @(7°), and it can be shown that ®(7”) is a monotone decreasing function of 7”. 

Thus since the critical region is defined by the relation (7°) < constant and 











228 RALPH J. BROOKNER 


the critical region using “Student’s ?” is T° = constant, these procedures are 
exactly equivalent. 


4. AProblem in statistical decisions. The question which aroused the interest 
of the writer in statistical decisions is the following one of multivariate statistical 
analysis. Suppose 2, 22, -°-* , 2» are known to be normally distributed with 
unknown means and unknown variances and covariances, and on the basis of 
a set of N independent. observations, a test is to be made of the hypothesis 
E(a,) = E(a) = --- = K(x,) = 0. Such a test may be carried out by using 
the generalized Student Ratio |4], and the hypothesis is either to be rejected or 
accepted as a whole. But consider the case in which the null hypothesis is 
rejected; it seems quite natural to ask for a more enlightening statement. Is it 
not possible to say that on the basis of the — the hypothesis should be 
rejected for xi, , Ti. °°° . but not rejected for %i,., 5 Vigges 0's Xi,? Thus 
we seek a division of the ae space into 2” alae aly exe clusive regions, each 
of which will lead us to reject the hypothesis of zero expected values for a par- 
ticular set of the x;’s and to accept it for the remaining set. 

We will consider a solution of the problem in the case that the covariance 
matrix of the joint normal distribution is known, and will motivate that solution 
by considering first the case of two variables. 

Suppose that X and Y are normally and independently distributed with un- 
known means, a and @, and with unit variances. The joint probability density 
function is then of the form 


to 


¥ -3)2! 


SCX, VY) = (1/2) -e [(X—a) 
The set of hypotheses is given as follows: 


H, is the hypothesis that a = 0 and 6 = 0 
Hz is the hypothesis that a ~ 0 and 6 = 0 
H; is the hypothesis that a = 0 and B ¥ 0 
H; is the hypothesis that a # O and 8 # 0. 


We have a sample of N independent pairs of observations (X,, Y.) where o = 
1, 2,---,N; then the density funetion in the 2N dimensional sample space is 


N —1s[(X,—a)?2+( Y,—£8)2] 
(1 /Qer)* .¢ Ie @ o—B)*) 


- 


We seek the set of regions .W,, M2, M3, M; in the sample space which are 


chosen such that if the sample point E falls in M; , we accept the hypothesis H, 


We take the following as the values of the losses if the wrong decision is reached: 


1. If H, is accepted, 

i) for any parameter point (a, 8), the loss is a continuous function otf 
(a + 8°), say W(a’ + 8°), which is zero for a = 8 = 0, is differentiable, 
strictly monotonically increasing, and possesses a finite maximum 
when multiplied by the normal density function. 


FETE 2M Snail 


OP SO 





STATISTICAL HYPOTHESES 229 


Il. If Hz is accepted, 
i) for any parameter point (a, 8) except (0, 0), the loss is W(8) where 
W is the same function as above, 
ii) the loss is W;, if the true parameter point is (0, 0). 
III. If H;3 is accepted, 
i) for any parameter point (a, 8) except (0, 0), the loss is W(a*) where 
W is the same function as above, 
ii) the loss is W;, if the true parameter point is (0, 0). 
IV. If H, is accepted, 
i) the loss is W. if the true parameter point is either (a, 0) for a ~ 0, 
or (0, 8) for B ¥ 0 
ii) the loss is W; if the true parameter point is (0, 0) 
where W,, We, and W; are constants subject to some slight restrictions which 
will be pointed out later. 
The functions R;(6, FE) are then the following: 


R,(0, E) = W(a’ + B’)G(a, B) fora + »° #0 
= 0 fora = B=0 
R.(0, E) = W(B’)G(a, B) for B ~ 0 
= W,G(0, 0) fora = B=0 
= 0 fora # 0,8 = 0 
R;(6, E) = W(a’)G(a, B) fora ~ 0 
= W,G(0, 0) fora = B=0 
= 0 fora = 0,8 #0 
R,(6, FE) = W2G(a, 0) fora ~ 0,8 = 0 
= W.G(0, B) fora = 0,8 ~ 0 
= W;G(0, 0) fora =B=0 
= () for a8 ~ 0 


where G(a, 8) is the normal distribution function 


((. e tN Ue—a)? +(y—8)?] 


x and y being the sample means. It should be pointed out that the use of the 
distribution of the sample means instead of the joint distribution of the observa- 
tions is justified since the sample means are sufficient statistics for the parameters 
a and p. 

We will use the notation R.(Z) to denote the maximum of R,(6, Z) with respect 
to a and 8, and it can easily be seen to be the maximum of two expressions which 
we will denote by II(1) and II(2) where II(1) is the maximum of W(6’)G(a, 8) 
and II(2) is the maximum of W,G(0, 0). Similarly, R3(Z) is the maximum of 
I1I(1) and III(2), and R,(Z) is the maximum of IV(1), IV(2), and IV(3), where 
these are the maxima of the two expressions involved in R;(6, #), and the three 
expressions in F,4(6, F£), respectively. 

We will first show that the function 2,(Z) is a monotonic increasing function 
of (27 + y’). We know that the maximum of R,(6, Z) is reached for values of 











230 RALPH J. BROOKNER 


a and 8 for which the partial derivatives of R,(6, LF) with respect to a and 8 
are zero, i.e., for which 


[N(x — a)W(a’ + B’) + 2aW' (a? + B°)1G(a, 8) - 


I 
o 


and 
[IN(y — B)W(a’ + 8°) + 28W'(a’ + 8’)|G(a, B) = 0 


where W'(a’ + 8°) is the derivative of W(a° + 6°) with respect to (a” + 8”). 
Since G(a, 8B) # 0, and W’(a’ + 8’) ¥ 0, these relations imply 


\'t—a a 
ia ¢ B | —_ 

or Bx = ay. Thus the maximum of the function 2,(6, EF) occurs for values of 
a and 8 which satisfy the relation a = (x/y)Bs. 

Consider any two straight lines a = (2’/y’)B and a = (x’’/y’’)B, and the 
values of the function 7?,(@, E) along these two lines. Obviously the values of 
the first factor W(a” + 8°) are equal for points along the lines equidistant from 
the origin. Also, if the values of 2’, y’, x’, and y” are such that 2” + y” = 
a’ + y’”, the values of the function G(a, 8) along both lines are equal for points 
equidistant from the origin, and it follows that R,(z’, y’) = Ri(x’’, y”’). Thus 
we have that R,(E) is a function of (2° + y’). 

Note that if the value of 2’” + y’” is greater than the value of x” + y”, the 
curve representing the function G(a, 8) along a = (x’’/y’’)8 is the same as that 
along the line a = (x’/y’)8, but it is shifted further from the origin. The values 
of W(a’ + 8°) are independent of x and y and the function is monotonic in 
a? + 87. Thus, the value of G(a, 8) for which R,(6, EF) is a maximum on a = 
(x’/y’)8 multiplies a larger value of W(a? + 8?) than on a = (2’/y’’)B, so the 
maximum when x’? + y’? exceeds x”? is the greater. But this proves that 
R,(E) is monotonically increasing in (a? + y?). 

In a similar manner, we now proceed to show that II(1) is a monotonically 
increasing function of y°. We know that a necessary condition for a maximum 
of II(1) is that 


dit(1) _ d1t(1) _ 
0a 0B 
The first of these two relations is 
W(6)N(x — a)G(a, 8) = 0 


which has the solutions W(8’) = 0 anda = x. But W(s°) = 0 only for B = 0 
and this value is a minimum of II(1), hence we have that the maximum is reached 
for a = x, so 


II(1) 


max. of W(B)Ce Ne? , 
. 


But along any two lines a = constant in the (a, 8)-plane, the function W(8°) 
has identical monotonically increasing values in 8° and the normal density 





STATISTICAL HYPOTHESES 231 


function is identical along two such lines for a fixed value of y°. An increase in 
the value of y” displaces the normal function from the origin but does not affect 
its shape, hence the value of the normal density function at which II(1) takes on 
its maximum is multiplied by a greater value of W(6’) when y’ is increased, so 
II(1) is monotonically increasing in y*. In exactly the same manner, we find 
that III(1) is a monotonically increasing function of 2’. 

Because the remaining functions are identical with the functions considered 
in the special case above, we have that 


Il(2) = Wice 
I11(2) = WiCe +") 
IV(1) = W.Ce™™ 
IV(2) = W.Ce™ 
IV(3) = W,Ce t+, 


Now it is apparent that R,(E) is never less than II(1) since 
Wa* + 6°)G(a, 8) = W(6")G(a, B) 


(the equality holds only for a = 0) and since a function which is never less than 
a second function cannot have a maximum less than the maximum of the second 
function. Also &,(Z) for the same reason is never less than III(1). Thus &,(Z) 
can be the minimum of the four functions R;(Z) at most when R,(E) is defined 
by II(2) and R;(E) is defined by ITI(2). 

Since II(2) and III(2) are the same monotonic decreasing function of (a + 
y’) and since R,(E) is a monotonic increasing function of (2° + y’), there is a 
value 7 of (a” + y*) such that R,(EZ) < II(2) when and only when 2° + 7’ <7. 
But for all values (x, y) we have that R\(Z) = II(1) and R,(Z) = III(1), hence 
for all values within the circle 2” + y* = 75 we have that 


(2) I1(1) = RE) < II(2) 
and 
(3) 11(1) < RE) < II(2) 


so it follows that R.(#) is defined by II(2) and R3(£) is defined by ITI(2) within 
the circle. 


We restrict the values of W; , We, and W; used in the definitions of the weight 
functions to be W; < W. < Ws, hence for all values of (x, y) 


, —3N (22 +y2 —~-3Ny2 
W Ce eres s WCe iy 
—3N (a2 +y —1N22 
W,Ce ew =< W.Ce ae 
and 


WiCe Net 


IA 


weerem™ 


so R,(EZ) is at least as great as II(2) over the whole plane; hence, in light of 
relation (2), R,(E) is at least as great as R,(E) for x* + y* S 76. Therefore, 
since (2) shows that R,(E) < R.(E) within the circle; (3) shows that R,(E) < 





232 RALPH J. BROOKNER 
R;(E) within the circle; and since quite obviously the relations do not hold 
outside the circle, we have that M, is the set of points 

x + y < Tp 


To determine the region Ms, we must determine those points outside M, for 
which R.(E) < R3(E) and R.(E) < RE). Consider first the part of the plane 
outside M, for which R.(E) is defined by II(2). This is the region for which 
11(2) > IT(1). Consider the curve in the plane defined by I1(2) = IT(1), that is, 


WC }N(x22+y?) a II(1). 


We take differentials and have 


—N(a? + WiC 8 tarda + ydy| = 2yldI1(1)/d(y’)|dy 


but this shows that dy/dx has the opposite sign from y,/x since dII(1)/d(y’) is 
always positive. Also note that for « = 0, the equation R,(Z) = II(2) is identi- 
cal with the equation II(1) = II(2), so for x = 0, we have II(1) > II(2) when 
|y| > 7 and II(1) < II(2) when | y| < 7. Furthermore, the curve II(1) = 
II(2) crosses the x axis at a finite value of x, since for y = 0, II(1) is a constant 
while II(2) is a decreasing function of x. 

We will refer to the various regions in the first quadrant of the (x, y)-plane 
shown in Figure I as follows: A is the part of the quadrant which is M, ; A, B, 
B’, and C are the regions in which R,(E) is defined by I1(2), that is, in which 
II(2) > II(1); and in the same manner, A, B, B’, and C’ are the regions in which 
R;(E) is defined by III(2). 

Since II(2) and III(2) are identical, we see that within the regions B and B’, 
R.(E) = R;(E) since in these regions R.(E£) is defined by 11(2) and R;(E) is 
defined by III(2). We have previously pointed out that 11(2) is never greater 
than R,(E), hence it is clear that B and B’ should belong to either M. or M3. 
and we will arbitrarily decide that B is part of M, and B’ part of M;. 

Consider then the region C ; here R.(E) is defined by I1(2) and R;(2) by HI), 
so within C 


11(2) = JIM(2) < INI) = R(F) 


and again II(2) < R,(E), so the region C is part of M.. By the same argument 
we have that C’ is a part of M3; since within C’ 


TW1(2) = Ti(2) < 1101) = RAF) 
and IIL(2) < R,(B). 


Now consider the remainder of the quadrant outside .14, B, B’, C, and (’. 
Here R.(E) is defined by II(1) and R3(Z) is defined by I11(1). Since II(1) is 
the same monotone increasing function of y° as IIT(1) is of a, we have I[(1) > 
II(1) for|y| > |x!) and I1(1) < TIC) for|7' > y.. Thus we see that in 
the region under discussion, R.(Z) is a minimum at most in the regions D and 
E and R;(E) a minimum at most in D’ and E’. 





STATISTICAL HYPOTHESES 


Fre. 1 


In order to determine then, that part of D and EF which belongs to VM. , we 
seek the region for which 


II1(1) < IV(1) when R,(E) is defined by IV(1) 


II(1) < IV(2) when R,(E) is defined by IV(2) 
11(1) < IV(3) when R,(E) is defined by IV(3). 


But within D and E we have that y’ < 2’, so it follows that IV(1) > IV(2) so 
R,(E) is never defined by IV(2) in D or E. Hence we need determine the points 
which satisfy the first and third of these relations. Now it is clear that the 
relation II(1) < IV(1) is equivalent to the relation | y| < yo for some value 
yo since II(1) is monotonically increasing in y’ and IV(1) is monotonically de- 
creasing in y’. Let y = yo be the line dividing D and E. 

We impose a restriction on W3 such that D is part of M2 and E is part of M,. 











234 RALPH J. BROOKNER 


This restriction is that within EZ, IV(3) < IV(1); note that since we are con- 
cerned only with | y | < |x|, this imposes the greatest restriction on W3; when 
x= Y = Yo, SO we are requiring that 


—3N(y2+y2) , --}N. 
W:Ce aN(yo tus) < Wi Ce 3 2 
or 


W3 S Woet?*%, 


It is simple to see that because of symmetry with respect to both axes and the 
origin, Mz is defined by 2” + y’ > 7 and|y| <|az|and|y|< yo; Ms by 
a+y >rand|2|<j|y|and|r| < 2;and M,by 2 + y’ > rj and|y| > 
yo and|a|> 2). It should be pointed out that r = y. 

We now consider the general case with a known covariance matrix. Con- 
sider the joint normally distributed variates Ss «Ben *** : whose covari- 
ance matrix is || oz; || (¢, 7 = 1, 2,---, p), where the o;7,’s are all known and 
where || 7; || is positive definite. The mean values of the Xj’s are Bi, Be, °°: , 
8, which are unknown. It is simple to see that we can consider new variates 
X; = by //o%; whose mean values are a; = 8;/ 4/o%; and whose covariance 
matrix is || ¢;;|| where o;; = 1. Ifa sample of N independent observations on 
the X7’s are given, we have immediately the observations on the X,’s, and we 
denote the sample means of the X,’s by 11, ®2, +++ , Xp, respectively. 

There are 2” hypotheses among which we wish to choose; as notation, we let 





Ho be qj = 


@ = +--+ =a,= 0 
H, be a ¥ 0, a2 = a3 = --- =a, = O 
H» be a, ¥ 0, a, = a3 = +: = ap = 0 
Hy be aya, ~ 0, ag = ay = -*: = arp =O 


etc. As a further abbreviation, let H’ denote any one of the p hypotheses H, , 
Hz, +--+, Hp ; let H’ denote any of the (?) hypotheses Hy», His, --- ; H® denote 
any of the (3) hypotheses Hy3, Hix, +--+ ; ete. Also let M;,;,...:, be the region 
of the sample space for which we accept the hypothesis H;,;,...;,, and let 
Ri, ig..-4,(0, E) = W(0, Hi, is...) f(E | 0) be the risk density function if the hy- 
pothesis H;,;,...;, is chosen, where we have used the notation 6 to represent the 
parameter point a1, a2, °°: , Q. 

We will also adopt the following notations: in referring to the parameter point 
(a1, Q2,°°*,@py), we will write (%1, 2%, ---,7%) = 0 to mean all points for 
which a;, = ai, = --- = ay, = Oand (a;,)(a;,) --- (a@;,) A O where 71, 22, 
--* de, ji, je, °°* je ave a permutation of the integers 1, 2,---,p. Further- 
more, we will write [j1, jo, --* , js] 4 0 to mean (i, 22, +++, %) = O. 

By Q we denote the covariance matrix of the X,’s and by L its inverse; we will 
denote the elements of L by \;;. By Q*"*"""* we denote the matrix obtained by 
striking out rows i1, i2, --- , 7 and columns 7;, 72, «++ , % from Q; by L'*?"""* 
we denote the inverse of the matrix Q*'*?’""*, and we will write the elements of 


cee anes 








STATISTICAL HYPOTHESES 235 


ye" on Os a ‘* Thus we can write the joint distribution of the set of 
sample means 21, %2,°*: ,2%p as 
(4) (e822 i i i—24) (25-075) 


Concerning the definition of the weight function, we will assume the following: 

I. If Ho is accepted, 

i) the loss is W(22D),j;a:a;) if the true parameter point is (a, 
Q2,°** , @p), Where W is a continuous, strictly monotonic increas- 
ing function whose value is zero if (1, 2,---, p) = 0. The fune- 
tion is restricted to increase slowly enough that the product of it 
and the density function (4) has a finite maximum with respect. to 
the a,’s 

II. If H’ is accepted, 

i) consider in particular H,, then for all parameter points except 
(1, 2,---,p) = 0, the value of the loss is W(22)jja:a;), where W 
is the function defined above. 

ii) the loss is Wo if the true parameter point is (1, 2, --- , p) = 0. 

III. If H’ is accepted, 

i) consider in particular H,,, then for all parameter points except 
(1,2, ---, p) = Oand [a] ¥ 0 and [b] ¥ 0, the loss is W(SZAfaia;), 
where W is the function defined above, 

ii) the loss is Wj if the true parameter point is either [a] ¥ 0 or [b] ¥ 0, 
where Wo < Wi, 

iii) the loss is W; if the true parameter point is (1, 2, --- , p) = 0 where 
Wo = Wi. 
In general; if H* is accepted, 

i) consider in particular H;,;,...;,, then for all parameter points except 
(1,2,---,p) = 90, [al x 0, [io] ¥ 0, ---, [t1, to] FO, [t1, a3] 4 O, -- 
etc., the loss i is W(=raAj? aie), 

) the loss is W* (r = 1, ‘2, -++,k — 1) if [¢;,, 73, --- , t5,] 4 0, where 


ji, jo, ***,Jjr ave r different positive integers less than or equal to k. 
Also Wi. < Wks S --- S Wi, Wize S Wie, Wis S Wig S Wis, 
etc. 


iii) the loss is Wo if (1, 2, --- , p) = 0, where wis Wi, 
where the W; are constants subject to some further slight restrictions which we 
will impose later. The => has been used throughout to denote summation over 
all values which 7 and 7 take on in L'***”""*, 

We consider first the risk density function corresponding to Ho, that is 

Ro(O, E) = W(ZEA spouse Ce PAO i) 
To maximize R,(@, EZ), we have the set of p equations obtained by setting the 
p partials of Ro(@, E) with respect to the a; equal to zero, which are necessary 
conditions. We have 
dR (0, E) _ f aw 


Sar aug + NZAts — api Co MEE sec—eotesa 
ai 





236 RALPH J. BROOKNER 


so the necessary conditions are 


ow 
je [N2Ai,(4; — a,;)]W = 0 


This can also be written 
(2X; ;a;)D-W (z) + W (z)NZA3 (x; — a ;) = 0 
where we have set z = SDA; ja;a; and where we use the notation D, to indicate 


differentiation with respect to z. Fix 7 at two particular values, say a and 5; 
then two of the equations of this system can be written 


(22), ,0,)D-W(e) + W(2)N2AAx; — a;) 
(2>A, ;a;)D.W (z) aa W (z)! TDN (2; — a;) 


(Dru jars) [DAv (x; —a;)|= (LAs ja) [DAa (a; — a;)| 


(Tra jar j) (VAg jj) = (Trp jars) (LAa jv 5)- 


This we can write as 


> p>) jAbKa jek DZAvirAa je 5 


VLNa Ave (a jx — ax;) = V0. 


Giving a and b the p combinations of values which are possible, this is a set of 
p’ linear homogeneous equations in the p’ unknowns (aj, — a: ;) which has the 
obvious solution a j7, — at; = 0 or apr, = ar ;. 

Thus we have that the maximum of the function Ro(6, F) is reached for a set 
of values of the a,’s which lie on the straight line 


(5) aj; (a;/a1)ay 


The function Ro(Z), which is the maximum of o(@, £) with respect to the 
a;’s is a monotonically increasing function of (YA; ;v;7;), which we show in the 
following manner. Because of (5), we see that 


DPA (a. — a(a; — a) = SVsdAei — (xi/xyaallx; — (x;/a)eal 
= SYA 0.2, [1 — (e/a). 
Also, 
VLA ia; = VDA; jira ui). 
Hence we see that Ro(Z) is the maximum with respect to w of 


Veo —}N(1—w)2 ZS Ag pzg2; 
Ww VDA; vir ;)Ce —s alin 








STATISTICAL HYPOTHESES 237 












‘ - , , , a ” ” ” 
so for two sample points EY’ = (z,,%,°°*, Za) and EB” = (%,2%2,°::, Lp) 


such that DA; ia; = VDA: sv; 2;, it is clear that R,(E’) = R(E”); thus Ro(E) 
is a function of TTA; ;viw;. 

But then without loss of generality, we can consider Ro(Z) along the 2; axis, 
i.e. for tv = 23 = --: = Xp = O. Using relation (5), we see that this implies 
that the maximizing parameter values are ag = a3 = --- = a,» = 0. But then 


Ro(E) = max. of W(Au af) Ce Oren)” 
a1 


























which we have previously shown is a monotonic increasing function of 2} . 
Therefore Ro(E) is » monotonic increasing function of LDA; ;xiw;. 

We will furthermore show that the- maximum of each risk density function 
corresponding to parts 7) as given in the weight functions are .monotonically 
increasing functions of certain quadratic forms in the x;. Consider for example 
the function corresponding to part 7) of R,(@, E), that is 


(6) W y = joujor j)Ce dij (eg ag) (2 j-09§) 


a aod 





























We will write the maximum of this function with respect to the a,’s as R,(7). 
Note that the weight function is not a function of a; , hence the partial derivative 
of (6) with respect to a, set equal to zero is equivalent to 


Dr j(xz; — aj) = 0. 
Squaring this relation and multiplying by N/2An gives 
(N/2dn) EDA (as — ai) (x; — aj) = 0 
so we can write the exponent in (6) 
Exp. = —(N/2dn)ZZ(AuAiz — AnsArg) (si — ai) (Xj; — 4). 


Because of the definition of d;;, if we write w;; for the cofactor of o:; in | o:;|, 
we have 


























Exp. — —(N ‘Qrur( ij 1)"|2>(onw: ; = 1 i j) (Xi — ai) (x; _ aj). 


But by a well known algebraic identity’, 






wnwis — wii; = | o:;| [cofactor of (o1n0;; — o1001;) mM | O43 | 


1 
4 Oij | *W; 






e ° ° . 1 
where we have written Wij to be the cofactor of o;; in | o;; | , SO 







rr | ax 1 
Exp. = —(N /2r | Giz | Llw; (x; a a;) (2 a a;). 
j | 1 
But Aun| oz) = won = o;;|, hence 
i N 1 
Exp. = = 9 LDA ij(2i —_ a;) (x; — a;). 
Therefore 
a a IWEEAR.(25—a,)(2j—¢ 
R(t) = max. of W(2DA;;a;.a;)Ce * a 





all a;'s 





* See M. Bocher, Introduction to Higher Algebra. 














238 RALPH J. BROOKNER 


But then it follows in exactly the same way as with R,(E) that R,(z) is a 
monotonically increasing function of 2D)\;;v;z;. For the other functions R,(7) 
corresponding to other hypotheses H’, the argument is identical, and for risk 
density functions corresponding to hypotheses with more than one a; ¥ 0, 
the same argument is repeated two or more times in succession to give the result. 

We will show that for any value of the parameters a; , a2, +--+ , @» the relation 


VLA; jaa; 2 TIAj ja: OL j 
holds. This relation is true if the relation 
wy .~ | > vt 
(7) 2 TZU[(w:;/ | | Cij ») a (wi j/ | Vij |) oxi ; = 0) 
oa > 1 r . . 
is true where we define w:; = 0. That is, if 
ie 1 1 Ihwe 1 
(1/ | oi; || o¢; |)XZwijon — Wij | Ti; arse ; = 0 
e iC . 1 
where we have substituted wy for its equal | ¢;;|. But note that 
1 : , io ce ' 
wi; = cofactor of (ou0:; — o1i01;) in | o;; | 


hence by the identity quoted (see footnote 2) 


' 1 1 
| Cig | Wig = WNWig — W111; 


so the left hand member of relation (7) is 
(1/| o:; || O%j YET(wijon — wnwis + wiwi;)aia; 
= (1/| 045 || 0%; |) DP Zerw wie; 
= [Swiiai)’, ‘(| o;; || ois | !) 


0 


IV 


since all matrices here are symmetric and positive definite. Note that the 
argument can he repeated one or more times to show 


W(2LA;jai0;) = W aan * sat ;) 
or ; 
W (EDN F °F *a:a;) = W(SEAH? aiar;) 
where i272, --* , 7 are any set of & different integers less than or equal to p, 


and jije-*:,Jjs are any subset of a2. --- , %. 
Consider the maximum of the expressions 
w'c —}NZZAij(zi¢—@y) (2j—a7;j) 


We know that (p — r) of the a,’s in these expressions are zero and by an argu- 
ment similar to that given above’, it is clear that if the ra,’s not equal to zero 
are Qi, , @i,,°** , @i,, then the maximum of the expressions is given by 


Wie EBM tre jz; : 


3 See p. 36. 


STATISTICAL HYPOTHESES 


Also for r = 0, the maximum is obviously 


WiCe*22 AG jriz; 


Recall that we have restricted the W*’s so that 


9 


ro“ yl 72 y 
(8) Wo = Wo S:-- S Wo and Wiss 
From a previous calculation, it follows that 

_~ . a _y tite 

SDA wa; [> VUA aw = VVA"*z,2; = 


We can then quite easily calculate the region My, that is, the region of the 
sample space for which Ro(E) is the minimum of all the R,,;,...;,(Z)’s. We 
have pointed out that 


W (E2); jaia;) = W oa a _ * cia j) 


so it follows that 


(10) R)(E) Po Beis ie---é, (2) 
that is 


R,(E) =>R (2) 


ty %g-- 
so long as R,,;,...;,(Z) is defined by Rj, i,...i,(2). 


From the relations (8) and (9), we have that 


. rly —INZEE AG 252; rkyy —}NEEX 
(11) WiCe th? i7877 < WiCe™ — 


for k = 2, 3,---,p. Now because 
WiC th? 252 621 


is a monotonic decreasing function of TDA; ,iz;, and because R,(E) is a mono- 
tonically increasing function of =D), ;aix;, there is a value rp such that within 
the ellipse SD\,,7i7; = 7), the relation 


(12) RE) < Wie th? ? it; 


holds, and outside it the opposite inequality holds. But from relations (10) 
and (12), it follows that within this ellipse, no R;,;,...;,(Z) except Ro(E) can be 
defined by R;,;,...;,(7). Then in view of relation (11) and since a quantity is 
certainly less than the maximum of several quantities if it is less than one of 
those several quantities, the region M, is the set of points SDA; ;7iz; < 7). 

Now consider the functions R,(F) in the region outside My). We know that 
RAE) = R,(i) when 


=e ary 2 ‘ + 
max. of W (ZZ; 0; ajo NEEM sles a;)(zj—a;) > Wie 1827s iz62; 


a,;’s 





240 RALPH J. BROOKNER 


and we will write R,(E) = R,(ii) when the opposite inequality holds. Consider 
a part of the sample space outside Mo in which 


R.,(E) = Ri,(ii) 


where k = 1, and where R,(E) ¥ R,(ii) forj ¥ i1, in, --- , ix. Wesee in this 
ease that R;,(F) = R,,(F) = --- = Ri(E) < RE), where again j ¥ is. 
lg, -++,t%. Furthermore, in this case, because of the relation (11), we have 
that EH should be a point of either M,, , M;,,--- or, M;,. We will arbitrarily 
decide in this case that E should be a point of M;. (s an integer < k) where 
i, is determined so that 


a sane te , 
VDA %rr; S TAijrzw; torany?t = 1, 2,- 


Now consider the region in which R,(E) = R,(i) for all r = 1, 2, -- 
We see that each F,(7) is the same monotonically increasing function of a quad- 
ratic form of the type SD\j;7;2;. Hence in order that. F be a point of a par- 
ticular VM. , it is necessary that 


(13) TDA Tie; S TUN jaar for all s ¥ r. 
= 


Now let us consider a fixed r and compare R,(7) with all R.;, :,...;,(£)’s for k 
We have pointed out that 


(14) VIN wit; S VIA ay, 


42 J 


so RAi) = Ryizig..-i,(¢) and hence R,(i) can be a minimum at most when all 
| ..(E)’s are defined by other than Ryizis-- i, (2). 
Consider then, any R,;,(£) when defined by other than R,,;,(i), that is when 


R,;,(E) is equal to one of 
W2Ce 22 MiteFF R,:,(it) (say) 
W2Ce BNE M2621 R,,, (iit) (say) 
WeCe 8? i777 = R,; (iv) (say). 
Because of the relations (8) and (14), we have that 
R,:(E) S Rrizis...i,(E) 


whenever these are defined by other than R,;,(¢) and A Bute .i,(t). Furthermore 
in the region defined by (13), we see that R,;,(ii) = R,;:,(iii), hence R,;, (E) is 
never defined by R,;,(zii) in this region. 

Now the relation R,(i) < R,;,(i2) is easily seen to be equivalent to the relation 


(15) TIN ay <i 





STATISTICAL HYPOTHESES 241 
for some value 7;. With the restriction on W% that it be not so much larger 
than Wj that when (12) does not hold, R,i, (EZ) is not defined by R,:, (iv), we have 
that the region for which R,(7) < R,i,i,...;,(Z) is the region defined by (13) and 
(15). 

We then restrict the relationship between the constants Wj} and W; to be 
such that for all points outside of Mo but within the region defined by (13) and 
(15), the relation TINY? ee, = LLAj ji; holds for j;, jo, +++, je each 
different from r. Note that this is not an unreasonable restriction since the right 
hand side of the relation is bounded above by rj , SDA; ;:x ; is bounded below by 
r,, and therefore, YDN4}!?"""’*x;2; is bounded below by some positive value 7° 
where 7° is a monotonically increasing function of 77 . 

Using a similar method, the region M;,;,...;, can be obtained after all regions 
M i, i.-..i, for all m < k have been derived. If some further restrictions are 
imposed on the constants in the weight functions similar to those formulated 
in deriving the region M,, it can be shown that the region M;j,i,...4,(k 2 
will be given by the inequalities 


TTA: WA; Sj} 1 


Saar “se: 2 Te for all m < k and all ji, +--+ jm 


cekataeicote, a ea Pics ail ‘ . ‘ 
TTA eae; S TSE "za; for all ji,-+-++ . je 


and 


STA awe; < te. 

Thus we have rationalized the following solution of the question posed at the 
beginning of section 4. We test the hypothesis E(x) = H(a) =--- = 
KE(x,) = 0 using the generalized Student ratio replacing the sample covariance 
matrix by the population covariance matrix since the latter is assumed to be 
known, at. some chosen level of significance. If the hypothesis is not rejected, 
we make the decision corresponding to Hy). If the ratio is significant, we com- 
pute the ratios 7’, T°, --- , T” where by definition 7'**’* is the generalized 
Student ratio computed for 2;,, 2j.,°°° 25, (ty te, 0+. thy uy Jay's Js 
is a permutation of the integers 1, 2,---, p), the variates xj, , i,,°** , Ly 
being ignored. 

We consider the smallest of the ratios computed on the basis of (p — 1) of 
the x,’s; say it is 7”. Then if 7” is not significant at some level of significance 
(which need not be the same level as considered before), we make the decision 
corresponding to H,.; if 7” is significant, we compute all the ratios based on 
(p — 2) of the x’s. If T” is the smallest of these, we make the decision cor- 
responding to H,, if T° is not significant but proceed to calculate the ratios based 
on (p — 3) of the 2;’s if it is significant, and so on. 


5. Concluding remarks. It should be pointed out that while the derivation 
of the explicit inequalities defining the various regions of acceptance may be 











242 RALPH J. BROOKNER 


rather involved, for any given sample point £, it is relatively simple to determine 
the region of acceptance to which this point EZ belongs. That is, we calculate 
the various values R;,;,...;,(Z) and choose the decision H;j,...;, if Rj, j....;,(Z) 
is the minimum of the values of R;,;,...;,(Z) for all values of 71, a2, +--+, ik. 
For making a decision on the basis of a given sample point EF, it is not necessary 
to find explicit analytic formulas defining the shapes of the various regions of 
acceptance. 

Since the principle used here is proposed merely as a substitute for Wald’s 
principle for the sake of mathematical simplification, it is felt that in certain 
problems Wald’s principle may be used as a check on the results. For example, 
it is felt that the new principle is apt to lead to decision regions of the proper 
shape though the exact sizes of these regions may not be correct. In cases where 
the decision regions cannot be determined by Wald’s principle, it seems possible 
that a determination may be made in Wald’s sense among the various decision 
regions having the same shapes as those given by the new principle. In the 
case considered here, for example, it may be possible to determine new values of 
Hs Fis *** 5 ¥es- 

I should like to express my very great appreciation to Professor H. Hotelling 
for many suggestions during the preparation of this paper and to Professor A. 
Wald for constant guidance. I should also like to credit Professor Helen Walker 
with originally posing the question that led to this research. 


REFERENCES 


{1] A. Wap, Annals of Math. Stat., Vol. 10 (1939), pp. 299-326. 

[2] A. Waxp, On the Principles of Statistical Inference, Notre Dame, Ind., 1942. 

[3] J. NEYMAN AND E. Pearson, Transactions of the Royal Society, A., Vol. 231 (1933), p. 295. 
[4] H. Hoteviine, Annals of Math. Stat., Vol. 2 (1931), pp. 360-378. 


a 


A TWO-SAMPLE TEST FOR A LINEAR HYPOTHESIS WHOSE POWER 
IS INDEPENDENT OF THE VARIANCE 


By CHARLES STEIN 
Asheville, N.C. 


1. Introduction. In a paper in the Annals of Mathematical Statistics, Dant- 
zig [1] proves that, for a sample of fixed size, there does not exist a test for Stu- 
dent’s hypothesis whose power is independent of the variance. Here, a two- 
sample test with this property will be presented, the size of the second sample 
depending upon the result of the first. The problem of determining confidence 


intervals, of preassigned length and confidence coefficient, for the mean of a 


normal distribution with unknown variance is solved by the same procedure. 
These considerations including the non-existence of a single-sample test whose 
power is independent of the variance, are extended to the case of a linear hy- 
pothesis. In order to make the power of a test or the length of a confidence 
interval exactly independent of the variance, it appears necessary to waste a 
small part of the information. ‘Thus, in practical applications, one will not use 
a test with this property, but rather a test which is uniformly more powerful, or 
an interval of the same length, whose confidence coefficient is a function of o, 
but always greater than the desired value, the difference usually being slight, at 
the same time reducing the expected number of observations by a small amount. 

Any two sample procedure, such as that discussed in this paper, can be con- 
sidered a special case of sequential analysis developed by Wald [5]. 

The problem of whether these tests and confidence intervals are in any sense 
optimum is unsolved. It is difficult even to formulate a definition of an optimum 
among sequential tests of a hypothesis against multiple alternatives. However 
it is shown that, if the variance and initial sample size are sufficiently large, the 
expected number of observations differs only slightly from the number of ob- 
servations required for a single-sample test when the variance is known. It also 
seems likely that the confidence intervals do possess some optimum property 
among the class of all two-sample procedures. 

Although Student’s hypothesis is a special case of a linear hypothesis, it is 
treated separately, because it illustrates the basic idea without any complicated 
notation or new distributions. The test for Student’s hypothesis involves the 
use only. of Student’s distribution, even for the power of the test, while the power 
function of the test proposed here for a linear hypothesis involves a new type of 
non-central F-distribution. 

The notation x’, is used as a generic symbol for a random variable equal to 
the sum of squares of n independently normally distributed random variables 
with mean 0 and variance 1, i.e., x% has the x’ distribution with n degrees of 
freedom, 


243 





CHARLES STEIN 


2 eee a :[ —hu , jn—1 yf 
P{x, < T} = Va)"TGn) 4 ° u”~ du for T > 0 


= (0 for T < 0. 
rn 


tributed with mean 0 and variance 1, independently of x7, i.e., f, has the dis- 
tribution of Student’s ¢ with n degrees of freedom, 


a oe 
ee 1 — dz. 
V nel Gn) Le\* * x , 
Fn. is a generic symbol for a random variable of the form F'n, = 2xXm/Mx >. , 


the numerator and denominator being independently distributed, i.e., F'm,. has 
the distribution of an F-ratio with m and n degrees of freedom, 


T(a(m + )) f? (m\" m1 — 
7 ee le m 7 


The notation ¢, is used as a generic symbol for , Where z is normally dis- 


P{t < t} = 


A symbol of the above type with an additional subscript a denotes the upper 
100a% significance level, e.g., tn,« is defined by 


P{t, > tua} = a. 


The symbol FE {x | Q(~)} denotes the set of all « such that the condition Q(z) 
holds. This should not be confused with E(x | 7'), which denotes the expected 
value of a random variable x, given the conditions 7. 

The size of a critical region is the probability that the sample point will lie 
within the region under the null hypothesis. The terms length and volume, as 
applied to confidence regions are used in the ordinary geometrical sense. 


2. The test for Student’s hypothesis. Suppose x;, 7 = 1, 2, --- are inde- 
pendently normally distributed with mean £ and variance o. We wish to test 
the hypothesis § = &, the power of the test to depend only upon  — &, not 
upon o. For this purpose we define a statistic ¢’ as follows. A sample of 7 
observations, 2%; --- «,, is taken, and the sample estimate, s’, of the variance 
computed by 


(1) 
Then n is defined by 


$2 ) 
(2) n = max | +1, + Lt, 


where z is a previously specified positive constant, [g] denoting the smallest 
integer less than g. Additional observations, x,,4:, °** , tn are taken, and, in 





A TWO SAMPLE TEST 245 


° ead “¢ ° 2 
accordance with an initially specified rule depending only upon s’, real numbers 
a;,i = 1---+ mare chosen in such a way that 


n 
>> a; = 1, a ae *o <= ee 
l 


n 
2 2 
s > a; = 2. 
1 


(3) 


This is clearly possible since 


(4) min )) a; =— <5 by Q), 
1 


the minimum being taken subject to the conditions 
>a = 1, a =—-@&%=::: = 
1 
Then ¢’ is defined by 
Dat: —& Dasa; — £) 
i Dictate eae sient Acca 
V2 V2 V2 





where 


> a;(x; — &) 
6 as seeeisiass 
™ P Wa 


Then wu has the distribution of Student’s ¢ with np — 1 degrees of freedom, re- 
gardless of the value of o°. For (nm — 1)s'/o’ has the distribution of Xne-t and 


ai Se ll ie : . 
the conditional distribution of a >. a(x; — —) = u, given s, is normal with 
@ “I 


mean 0 and variance o Da;/z = o°/s’. But the usual form of a random variable 
tno1 18 tno1 = y/s, y being normally distributed with mean 0 and variance o’, 
and (no — 1)s°/o” having the distribution of X'no-l , independent of y. Thus the 
conditional distribution of wu, given s, is normal with mean 0 and variance o/s’, 
so that ¢,,-1 and u have the same distribution. 

This theorem can be used to obtain an unbiased test for the hypothesis Ho 
that € = &, the power being independent of o°, which is supposed unknown. 
Let a be the desired size of the critical region and let t,,-1,2/2 be such that 


(7) P {tap-1 > tng—1,¢/2} = 5: 











246 CHARLES STEIN 


Then if we reject Hy whenever 

n | 

7 a;%; — §| 

—_—__———|> tno—1,0/2 5 
Ve | 


we obtain an unbiased test of Hp , whose power function is 1 — B(€) where 


(9) B(é) — PY tag ra + — < tno—1 < tno—1,a/2 + > 


(8) 








The fact that the test is unbiased follows immediately from the symmetry and 
unimodality of the ¢ distribution. 

If we wish to test the hypothesis Ho: = & against one-sided alternatives 
& > &, the procedure is similar. The critical region of size a is defined by 


(10) a > bee t0 

and the power function is 

> 
(11) i- B(é) - P {tu > tno—1,0 + —— 


A confidence interval for £, of predetermined length | and confidence co- 
efficient 1 — a can be obtained by selecting z so that 


(f ol l \ 
l—-a= P\- ave < trea < 975, 


| n 
=P}_ l > a(x; — €) 


1 


a <=> 
\ 2V/ 2 V2 2V2 


(12) Pe > 


where ¢ is the true mean of the distribution. Thus (Za; — 1/2, Yaw; + 1/2) 
is the desired confidence interval. 

In the above tests and confidence intervals, the distribution of the required 
number of observations, n, is 


6 


” 
to 


Pin =m+ 1} = PAE <m +i} 


x | 


<P Ng TR GENES ET TEE SN PS 


Te TAIT I eT 








/2) 


red 


rag rE Pe 


gO TPL ERIE SE TPT ENTREE I 


\ TWO SAMPLE TEST 247 


(13) 


P{(mo — 1)8"*/a" < (mo + 1)(m0 — 12/o"} = P{x2,-1 < y} 


1 y 1 1 
_— —2U , 3(nog—3) 
(o/2)"" TA(m — iy J —— 


where y = (nj — 1) 2/o’, 
32 
ply<t+iseti} 


P{(v — 1)(m — 1)e/o” < Xne-l < v(no — 1)2/0°} 
1 v(no—1) 2/02 : x ‘ 
re ennmatntinn yy no— d 
(V2)"°* P(R(me — 1)) Ja cng—tne/e? . = 


for integral vy > mo + 1, all other values being impossible. Thus the expected 
number of observations, E(n), satisfies the inequalities 


— naneiniiinnmedaie if (no + 1)e"™ yo) + [ ee irre au} 
(4/2)"*" P(4(m0 — 1)) \Jo ** : 2(mo — 1) 
< E(n) 
(15) 1 . —ju  }(no—3) —hu  #(no—8) 
© < CaP Dy Yh Mot Deer? du + [ee 


Pin =v} 


(14) 


v 
o7u 
-| ———. + 1} du), 
aga) } 
which ean be rewritten 


(no + 1)P {x31 < y} + — Pixnott > y} 
(16) 


< E(n) < (no + 1)P{xi,-1 < y} += Pixies > y} + Pix > y}.- 


Consequently H(n) is a function of o”, and can be evaluated from tables of the 
incomplete I’ function. 

As mentioned in the introduction, these tests and confidence intervals will 
not be used exactly in this form, since they waste information in order to make 
the power of the test or the length of the confidence interval strictly independent 
of the variance. Instead of (2) we take a total of 


(17) n = max {E] + 1, mh 


observations, and define 


1 n 
Gye — & Vn 
fe S.. ancestries 


(18) Se - 8 


u 
| 
< 
3! 
+ 
| 
> 


| 
~ 
~ 
_ 
wer 
~” 
os 
! 





248 CHARLES STELN 


By the same reascning as that following (6), u’ has the ¢ distribution with no — | 
degrees of freedom. By (2) 


(19) s so that, although 
is 2 random variable, 

(20) 

Thus, if we use 


(21) > tetan et O” > tac 


instead of (8) or (10) respectively, we shall always increase the power of the test. 
Also the expected number of observations will be reduced from that in (16) by 
PixXnoa < y}. Similarly if z is defined as in (12), the interval 


1 "% l | n l 
a XN; —=x, = 2+ 
E x 2 n x 2 
has length /, and the probability that it covers the true mean € is a function of o, 
but is always greater than | — a, and differs only slightly from 1 — a if o > 
noz. Thus it can be used instead of the confidence interval (12). 
From (16) it follows that 

. (_ , o° | 

lim 4 E(n) — = 

oc | z J 

9\ 

—- o 

lim {E(n) — ~$> 0, 

71720 \ S } 
the approximation H(n) = o° z being fair provided o > zn. The length of 
the confidence interval (12) is given by 
V E(n) 
When the variance o is known, the length of the single-sample confidence 
interval of confidence coefficient 1 — a obtained on the basis of n observations 
is given by 


« 


) ¢ es 20tno—1,0/2 
= Ungar Va = 


1 "eh vr 
— = -— a Ga 
. V2 —14/n/20 


l = 2 0, a/20 f Vv n . 


Since, even for moderate values of np, say % > 30, tno—1,a/2 differs only slightly 
from é,.a/2, the expected number of observations for a confidence interval of 





A TWO SAMPLE TEST 249 


given length and confidence coefficient is only slightly larger than the fixed num- 
ber of observations required in the single-sample case when the variance is 
known provided the variance is moderately large. 

3. Distribution of a non-central F-ratio. In the extention of the above 
considerations to the testing of a general linear hypothesis, the power function 
depends on the distribution of a quantity 


(22) F’= Da: — «)’, 
1 
where qi = Vr «; being independently normally distributed with mean 0 and 


‘ ° 2 ° ° e e . rp 
variance 1, and r having the x;, distribution, independently of the 2;. The 
c; are real constants. 

Let 


(23) ¢ = » C; e, | /> c 
1 


m 


yo (2; — e;¢) —_ > 23 a Pa 
: 1 


mm" 


1 1 


m 
Now, >> (2; — ei¢)’ is « quadratic form of rank m — 1 since the x; — cit are 
1 


subject to one linear homogeneous restriction, namely >> ¢(x; — c<) = 0. 
2 


™ 
Also @ is of rank 1, and x* + °° = >, x; so that, by Cochran’s Theorem, x and 
1 


? e . ‘ 9 > _— ° 

© are independently distributed as x3,1 and xj respectively. Thus there exist 
Y1-** Ym, independently normally distributed with mean 0 and variance 1 
such that 


(25) = ye tees + Ym 
Yi. 


Let u; = Vi . Then the joint distribution of u, --- Um is given by 


1 1 
> Um < Tm} = (\/2r)” (+/2)" T(3n) 


ae | ee 
0 — . 


o 


Plu <m,°-: 


(26) 











250 CHARLES STEIN 


The density function is given by 
a” Pith < 11, +++) tm < tal 


O71 +--+ OTm 


}(n—2) rim ob dr 


7 ve ay" "T(4n) [- 


bh (438 i(n+m—2) 7 
(27) “ Ey (/2)” Tan © eH), ” 


(+84) 


7 (/x y™ 2h") Tan) 
P(3(n + m)) ~ ¢ -}(m+n) 
© (Wr)"P(3n) (: > )' 





ag ¢? k(n+m—2) dt 


Then let 


(28) pm ee = UW, mae mut. + ud. 


The joint distribution of 7! and 7” is thus, by (27), 


P{n! <n, 7” hula, 


FOB [f(s a 


u1<™ Bui<s 


T'(3(m + n)) / / / 2\- -4(m +n)+3(m—] 
~ (Wx)"PGn) — oe 
(29) ui<", Su3<r2/(14u2) 








(1 + Sui) — du; dy2 +++ dy» 
ae TG(m + n)) 2 (n+l 
Mere ff ase 


ui<m Dur<r2/(1+uj) 








. (1 + > i) du; dy2 +++ dym. 


In order to evaluate this integral, we use the fact that the distribution of a ratio 
of x1 to x41, the two being independent, ean be expressed in two forms, by 
(27) and Wilks [2], p. 114, 
T'(3(m + n))  m—1)=1, ore 
( yf ta + oy ae 
T(3(m — 1))TEG(n + 2) do 


(30)  Te@(m+n)) | ( mI \ - 
~ (vx) Tan + a“ * - i+ dig dq: +++ qm; 


m~1 
D vi<v 
1 





P{xm1/Xati <¥} = 











dy 


AY m 


tes « 
atio 
_ by 


) dy 


es 





A TWO SAMPLE TEST 251 


so that 


Pla <a, 7" <7} 
T'(3(m + n)) 


~ 9/9 (3n)T(4(m — 1) 


g=r2/(1 tu?) . y 
x / (1 + 7 baie heii’ 6 + cw de du, 
U1< 


¢g=0 


(31) __TQ@(m + n)) 


~ /xT(3n)P(3(m — 1)) 


n r2 2\—3(nm+1) <4(m—3) = : 2\—}(m—3)—1 
[Soa eaten en (1 ea) a at 





~ /xT(An)T(R(m — 1)) 


Now we wish to find the distribution of 


FeO pe a? + gy ™ ade du. 


uy=—oo 4f=0 


PG(m + n)) [ ; 


™ 


F’ = z (t; Td c;)" 


1 


™ 


~\3 
t— Vr 2 = 
ee 8 4 & = vrei) 
r r r 
7" +4 (»’ _— V xc)’. 
Carrying out the transformation (32), it is found that the joint density function 
of 7’ and F’ is 
p(n’, F’) dn’ dF’ 


_ __TG@(m + n)) ne 
~~ WV xT (an) (4(m — 1)) [F’ — (n’ — V3c2)’] 


(32) 


I 


(33) x [Lt 9? + PF’ = (n! — V 308) TU" dy! dF’ 
_ __TG(m + 0) _ 
— Val (4n)0(3(m — 1)) 


[F’ = oe 
X [1 + F’ + 2pvV/Sde2 + Sei” dp dF’, 


where p = 7’ — ~/3c?. In order to obtain the distribution of F’ we must inte- 
grate out p over —\/F < p < /F, obtaining 


PiF’ < T} = ®,,,,(T, Sc?) 
_ __TEG(m + n)) 
(34) +a P(4n)P(4(m — 1)) 
T ra/F’ . - ' —— 21—4 (m+n) 
™ I | _ (F = P Pll + OF + 2p VSe8 + Bei dp dF’. 
F’=0 “p=—v/ F’ 


In the case Sc; = 0, (34) reduces to the distribution of the ratio x2,/x*, . 





252 CHARLES STEIN 


4. Test of a linear hypothesis. In this case the power of the test usually 
employed is affected not only by the variance, but also by the values of the pre- 
dictors. In order to avoid this difficulty, it will be assumed that only a prede- 
termined number of different sets of predictors are used, and that these sets are 
repeated as a whole, as many times as is necessary. This covers, in particular, 
the replication of orthogonal designs for the analysis of variance. 

Let yiz, 7 = 1-+-m,j = 1, 2,--- be independently normally distributed 
with means 


pb 
(35) Ey;; = Zz. Ap XL; nom, rank (a;;) = p, 
k=1 


and variance o, the 2;; being given in advance, o and a, unknown. We wish 


B 
to test Ho: 7 Cua, = Co, l = 1-+-+r < pw, where we may suppose equations 
k=1 


(36) linearly independent, the c,, being given constants. It will be convenient 
to reduce this to a canonical form, as in Tang [3]. First, bv a non-singular 
linear transformation 


(37) “Ke = » 3 ber 225 


¢=)] 
we can make 


\ 


21i 


m . 


(38) i ~ | (eis + aus) = Lp, the » X yu identity matrix, 
i=l 


Bui! 


any two sets of b,, that accomplish this being related by an- orthogonal trans- 
formation. Then (35) becomes 


ky; = 
(39) 


and (34) becomes 


nk 9 nk r ° ° e 
where b”” are such that 2b""b;.; = 6m1, the Kronecker delta, or, in matrix notation 
—] k " ° ° 
(bim) = (Bb). Next, the equations (40) can be made into an orthonormal set 


“ 
(41) Cro = >> CimOm 


m=1 





A TWO SAMPLE TEST 


i.e., one in which 


uz 
(42) Cem Cim = Set 


m=1 
‘ 5 ‘ , , M26 ‘i 
by a non-singular linear transformation on the cz, . Clearly Scjo is an invariant 
of (41), i.e., it does not depend upon the choice of a particular transformation 
a . ° . — ss ” * ° 

(37), or of a particular transformation of the czm into Cim, since, in both cases, 
all admissible transformations are connected by an orthogonal transformation. 
Then we define 


(43) Yis = De iui t= i,---,@ 
q= 
(44) Ys = Dedia¥s, t=uti,---,m 
q=1 


in such a way that ~ is an orthogonal matrix which is possible, by (38). 
tq . 
Then 


m 


m ue 
7 , i 7 Sa , 
Eyis = Do tieE ya = 2d Big p> Bkq Ax 
a= t= 


q=! 


(45) 
B ™ 
’ ’ ; 
DM Dy Bigtkq =a; for t=1,--: . 
k=1 q=!1 


™m 


m uz 

 - + / 

By is = Do dig Eyas = Do dig Dy tee 
q=1 q=1 k=1 


(46) 


™m 


rm 
, 
7 ay. Zz diq Zkq = 0 for 
k=1 q=1 
Finally we define 


(47) Vis = Vii 


BB 
(48) Yi ™- 2, tun Yi » 


m=! 


u 
” , ° 
(49) Ya; = > Cim Ymj 5 ~=f ae i. are 
m=1 


Cim \ ' i 
where the e;,, are such that () is an orthogonal matrix. Since the transforma- 
4m 


. ° e a ” ° 
tion applied to the y;; to obtain y;; is orthogonal, the y;; are independently 
normally distributed with variance o°. Also 


(50) Ey; =O,i=wt1,--:,t 


u 
~+ \ ” , 
(51) Ey;; — bm CimAm = Cio, 
m=1 


u 
i ' 
(52) SY ii ~ Cim Am 


m=l1 


























254 CHARLES STEIN 


Since (50), (51), (52) were obtained from the original formulation by a non- 
singular linear transformation, the derivation can be reversed, which implies 
the equivalence of (50), (51), (52) to the problem as originally formulated. 

Thus we can restate the problem in the following manner. Let y;;, 7 = 1, 
--+,t,j9 = 1, 2,--- be independently normally distributed with variance o 
and means 


bo 


Eyi3 = &,t=1,+++ yu 
(53) s 9 
Ey;; = 0, 7 =yu+1,---,t, & and o unknown. 


We wish to test 


(54) Mo:& = 0, =1,---,p Su 
the ¢; fori = p+ 1--- wand o’ being nuisance parameters. 
Obtain a first sample y;;,7 = 1,---,t,7 = 1,-+-+-+,m. Estimate the vari- 


sunece by 


s ] no ¢t . 1 fn no " 
(55) 6 ‘= 2 Vii — 7 (= 7} 


Let ¢ be a predetermined constant, and n be defined by 


(56) n = max [i] +1, + 1}, 


After s“ has been obtained, determine a set of real numbers, a; --- a, , in accord- 
unce with a preassigned rule, so as to satisfy 


b 


ta;= 1 
(57) sila; =2 
i a = Hi 
Then 
p n 2 
» > ays) 
(58) Y . = S....f. 


a(not — p) 


has the non-central F-distribution given by (34) with n = nt — yu, m = p and 


Pp : Pp 
(59) > c = Do H/(mot — we, 
l 1 


where é; are the true means, allowing for the possibility that Ho is not true. For, 
(not — u)s’/o° has the distribution of Wcieeies and, after it has been determined, 





n 
>> ayij — &,7 = 1 --- 7, are independently normally distributed with mean 0 
7=!1 


n 
° 252 2 2 . 2 mm 6 
and variance ¢ La; = o2/s,, so that, given s (3 ayis — &VV2,i=1--- p 
j=l 








“ 


]- 


or, 


d, 
. 0 


A TWO SAMPLE TEST 255 


are independently normally distributed with mean 0 and variance o°/s’. But 
the random variables ¢; , in section 3 are of the form x;/+/r where the 2; are inde- 
pendently normally distributed with mean 0 and variance a’, while r/o” has the 
Xngt-u distribution independent of the x;. Thus ¢; can be considered to have 
been obtained by first selecting a stochastic variable r such that r/o” has the 
distribution of x%,:, and then selecting ¢; to be independently normally dis- 
tributed, given r, with mean 0 and variance o°/r. Since r corresponds with 
(not — u)s°, comparing this with the above, we find that 


Ze A; Yis — & 

— ated 
VJV2Vmt — » 

have the same joint distribution as the t;. The 


that 


(60) 





&; 
7 — are constants, so 
Vint — sz 


ps (> ays) p > AjYis — &i é; | 
Pete cena p og — = 


6) F=5 


it (Valmet — pw)” Valnat — ») 


Pp 
has the same distribution (34) as >> (tf; — ci)” with e: = &:/-~/(mot — we - 
i=l 


The tests of significance and confidence regions are obtained by a procedure 
completely analogous to that used in the case of Student’s hypothesis. If we 
define kK = Fy not-n,a by 


(62) P{F ynot-u > k} = a, 


then a critical region of size a for testing Ho is given by 


2(not — w) 


(63) Fe > b. 
Pp 
Its power function is 
E 9 
dk 
f 2(mot — p) 


Similarly, a confidence region for &;, i = 1 --- p, of confidence coefficient 1 — a 
is given by the set of all &; such that 


(64) _- B(é) =I1- Py ,not—n 


(65) met Pe &) <b, 
where 

a 7 Q;Yij — é) 
(66) PQs --- &) oe Dee , 


(mot — yp) 





256 CHARLES STEIN 

It is evident that this defines the interior of the hypersphere 
Pp n 2 

(67) dX (« -> ays) < kep 
= 7i=1 

whose volume is independent of the variance o’. 


The distribution of n, the required number of sets of observations for the 
above tests and confidence intervals is given by 


2 
Pin =m+ 1} = PAE <n +} 


(68) = P{(not — u)s'/o < (mo + 1)(not — u)2/o°} 


2 1 : —ju }b— 
= Pixs <y} = ware l eu du, 
where 
= (no + 1)(not — w)2/o° 


= Not — yw 


(69) 
and 
3 
Pi{n = »} =P <eticrti} 


= P{(v — 1)b2/0° < x§ < véa/o°} 


1 vbs/o2 


7 (0/2) 1(38) J crty50/02 ; 


for integral »y > np + 1, all other values being impossible. 
Thus E(n) satisfies the inequalities 


1 ; | ah 
(o/2)* (38) i (no + ~~ ye du + | et ye Z iu} 
; yu 
(71) < E(n) 


nein’ hu 1 7 lu 1-1 7 U 
< waren (no + l)e*™u au + | eu G +1)dut, 


which can be rewritten 


—} 15. 
}u u? * du, 


(no + IP ix < y} + = Pixiss > 9) 
< E(n) 


< (mo + 1)P{xi < y} + ~ Pixiss > y} + Pixs > y}. 





A TWO SAMPLE TEST 257 


The modifications required to avoid wasting information are exactly analogous 
to those made in the case of the test for Student’s hypothesis. 


5. Non existence of a single-sample test for a linear hypothesis whose power 
is independent of the variance. The canonical form (see Tang [3]) for a linear 
hypothesis in the single sample case can be derived immediately from (53) and 
(54). Let a;, 7 = 1 ---n be independently normally distributed with means 


(73) Bz; = §,i= 1---p 


Ex; = 0,8 =ptil--en 


» 


and variance o’. 
§ = I +++ @, 
The most powerf ul test for Ho against a given alternative é; = §),7 = 1--- p, 


if the variance o is known, is that based upon the probability ratio (see en man 
and Pearson [4}) 


The £; and o are unknown, and we wish to test Ho:& = 0, 


l _—- {3 (z5—€i0)?+ > 23} 
ese wR toe Be 
OL 
Po 


1 3a 


73. e 202 4 
(V/ 2x o) 


Since any strictly increasing function of p:/po is equivalent for this purpose, 
we can use 


(75) g(t +++ 2p) = a. E02; « 


The critical region of size a based upon ¢ is given by 


e | > bos 


— | 
ar » Evo I 
where 


Lf? 4s 
(77) va | Mae =o, 


p 
since, under Ho, >, gor; is normally distributed with mean 0 and variance 
1 


(76) Wo(e) = 


Pp Pp Pp 
o » tio. Under Mi, dX £2; is normally distributed with mean » fio and 





258 CHARLES STEIN 


variance o” . fio. Thus the power of the test for the alternative H, as a func- 
tion of o” is 
1 — Bo) = Pixe Wolo) | &; = Ew, a} 
es tans — Bob v3 Stl 
= P ) 7a ~( 
y tin } 


l - a 
= \/ an | ves. e” dz. 


Now let us suppose there exists a test based on the critical region W of size a 
whose power 1 — 8 is independent of o. Since Wo(c) is the best critical region 
of size a for any o we must have 


1 f° 7 
- 1-81 — Bile) = Foe | vee eae 


so that 


1 Ww 
(80) 1—8< gb. [1 — Bo(c)] = aa | ee” dxt=a. 


By interchanging Ho and H, we can reverse the inequality (80), proving 
(81) 1-p= 


Thus any single-sample test for a linear hypothesis whose power is independent 
of the variance has constant power equal to the size of the critical region. 


REFERENCES 

[1] Georce B. Danrtzie, “‘On the non-existence of tests of “‘Student’s’”’ hypothesis having 
power functions independent of ¢,’’ Annals of Math. Stat., Vol. 11 (1940), p. 186. 

[2] S. S. Wixtks, Mathematical Statistics, Princeton, 1943. 

[3] P. C. Tane, ‘‘The power function of the analysis of variance tests,’’ Stat. Res. Mem., 
Vol. 2 (1938). 

[4] NEYMAN AND Pearson, Stat. Res. Mem., Vol. 1 (1936). 

[5] A. Wap, ‘‘Sequential tests of statistical hypotheses,’”’ Annals of Math. Stat., Vol. 16, 
June 1945. 





COMPACT COMPUTATION OF THE INVERSE OF A MATRIX 
By FrepERICcCK V. WavuGH AND Paut S. Dwyer 
War Food Administration and The University of Michigan 


1, Introduction. Among the most common applications of mathematics to 
practical problems are the solution of simultaneous equations, the evaluation of 
determinants, and the computation of the complete inverse, (or the complete 
adjugate), of a given matrix. Even with modern computing machines these are 
laborious, time-consuming jobs. For that reason there has been great interest 
in recent years in the development of so-called “compact”? methods; that is, 
methods that eliminate all unnecessary detail, that use computing machines 
to do as much of the work as possible, and that only require copying the results 
needed in further analysis. 

In 1935 a paper by one of the authors [1] and since then papers by the other 
author [2], [3], [4], [5], [6] and [7] have outlined a variety of compact methods 
and have applied them to actual problems. These papers, together with other 
recent contributions, such as those presented in [8], [9] and [10], have resulted 
in much improved and more compact techniques in the general field of the solu- 
tion of linear simultaneous equations and allied topics, especially if the matrix 
is axi-symmetric. It is not generally recognized, however, that extension of 
these procedures (usually involving matrix factorization [7] [10]) can be used 
to compute the inverse (and adjugate) directly from the matrix factors without 
the necessity of the reduction of the unit matrix [11; 150] [2; 121] when the 
matrix is non-symmetric. 

The present paper extends the use of compact methods in three ways. 

(a) It presents a method of computing the inverse (and adjugate) of a sym- 
metric or non-symmetric matrix by compact Gaussian methods without the 
formal reduction of an auxiliary identity matrix. 

(b) It introduces the method of multiplication and subtraction with division— 
a modification of the method of multiplication and subtraction—and shows that 
the terms recorded in the compact solution are themselves determinants which 
are minors of the determinant of the matrix. 

(c) It uses the method of multiplication and subtraction with division as a 
compact means of computing the exact value of any minor of the determinant 
of the matrix (whether symmetric or non-symmetric). It further shows how all 
cofactors of order n — 1 (constituting the adjugate) can be computed from a 
compact presentation of the calculations of the determinant of the matrix. 


2. Gaussian methods and notation. Probably the method most generally 
used to solve simultaneous equations is the division method originated by Gauss 
[12]. Variations of this method are known as the Doolittle Method [13], the 
method of pivotal condensation [14], the method of single division [2; 104-112], 

259 





260 FREDERICK V. WAUGH AND PAUL S. DWYER 


and the Crout method [8]. The methods as outlined by Gauss and Doolittle 
are applicable only to axi-symmetric matrices (common to least squares theory) 
while a more general presentation, applicable to non-symmetric matrices as well, 
has been made by more recent authors. 

The compact form of this method, extended to apply to the non-symmetric 
matrix, used in this paper is as follows: 

Given the matrix 


we compute 


a Ban-12- -on—l 
where 


= An/Ay 
= dy — bedi. 


=e b,1@y2) / 22.1 
(3) a3k — sedi. a bs2.1ex-1 
(a,3 — D1Q13 — b,2.10l03.1) / 33.12 


and in general 


Qjk-12---j—-1 Apj-12---j—1 
= Ark-12---j-1 $$ ; 
j5-12---j—-1 
(4) 
Ork-12--+5 
brx.12...5 = ——— 
Okk-12---7 

It should be noted that Crout’s presentation [8] is similar to that used here 
except that Crout divides the elements of each row by the leading element while 
we divide the elements of columns. 

The notation used above, introduced by one of the authors [2], parallels that 
used extensively in multiple correlation and regression theory. It differs some- 
what from the notation used by Gauss. See [12; 69]. 

Since every b is the ratio of two a’s it follows that every b can be written in 
terms of a’s so that the formulas can be written in terms of a’s alone. This is 
what Gauss did although he used [ ]’s instead of a’s. Gauss also used letters to 


indicate the primary subscripts and a single secondary subscript to indicate 





INVERSE OF A MATRIX 261 


the number of eliminations. Thus our ay. was written by Gauss as [bb, 1] and 
33.12 appeared as [cc, 2]. 

It is in the interest of less extensive notation and it makes our notation some- 
what closer to that introduced by Gauss if we replace 


pk 12-65 by (rk. (7) 
Dreta.--5 by — brx.c3 - 


This shortened notation can always be used when the secondary subscripts 
include all the integers from 1 to 7. In this modified notation the formulas (4) 
become 
Ark-(j) = Ark.(j—-1) — eee 
(5) 53-(j-1) 
bri-(3) —— 
Akk- (3) 

3. Solution by matrix factorization. The values of matrix (2) are in general 
not final answers to proposed problems but they are values from which final 
answers can be computed. The matrix (2) exhibits essentially both the triangu- 
lar matrix of the a,,.,;,) which we call t and the triangular matrix b,,.,;, which 
we call 8. (The diagonal entries of the 8 matrix are all unity and do not appear.) 
Hence (2) is really 3 — 3 + t. 

A basic property, useful in most problems involving the use of (2), is that 
and t are factors of a. Thus 


(6) a= st and a-— st = 0. 


That this is true in the symmetric case was proved in an earlier paper [7; 85}. 
That this is also true for the non-symmetric case is now shown in a similar 
manner. 

Let t; be a matrix (nm by n) with the first row composed of elements ay, and 
all other elements 0. Let 8, be a similar matrix with first column elements b, = 
— and all other elements 0. Then a — &t; = a, = (@,%.1) is a matrix (n by n) 

11 
with all elements of the first column and first row 0. 

Next let te be a matrix (n by n) with the second row elements ay;., and all 
other elements 0. Let & be a matrix (n by ») with second column elements 
b,9., and all other elements 0. Then a — Ste = a2 = (G,4.@)) is a matrix (n by n) 
with each element of the first two columns and first two rows equal to 0. 

This process is continued through n successive steps, an additional row and 
column being made identically zero at each step. We have then 


(7) a— $f — Soto —---— $f, = Gan = 0. 
Now consider the triangular matrix 


t=t+t+t+--- +t, 








262 FREDERICK V. WAUGH AND PAUL S. DWYER 






with its rows composed of the non-zero rows of t. Consider als» the triangular 
matrix 8 = % + &+---+ 8,. Then 8 = iti + Soto +--+ + Sat, since 
8t; = 0 for 7 ¥ j; and (7) becomes 

















a-—ét=0 or a= &t. 


4. Gaussian computation of inverse (and adjugate) without formal reduction 
of auxiliary identity matrix. The inverse of a,a * = € = (c,) can be calculated 
directly from the matrices 8 and t of (2). The adjugate D = (d,x) can be calcu- 
lated by multiplication by the determinant of the matrix and this can be calcu- 
lated by the well known formula 


(8) A = Gy1022-1033.(2) ** * Ann-(n-1) - 


The theory is presented in some detail and illustrated for the case n = 4 after 
which a more general matrix presentation is given. The matrix equation 


af = 3 is equivalent to the following 4’ simultaneous equations in the 4° un- 
knowns (c;;): 


2k=3 k=4 



















11 Cie + Aye Cox + 13 C3~p + AC = 1 0 0 0 
(9) Ooi Crk + Gee Co, + nz Czz + Arty, = O ] 0 0 
Gigi Cu, + Age Co, + O33 Caz + Au Cae = O 0 1 0 
O41 Cre + Age Co. + O43 C3p + Aue Cae = O 0 0 I 


Now since Ga = $ also we have a’@’ = §& and there results another set of 4° 
equations in the 4° unknowns (cx) . 


r=l1 r=2 r=3 r=4 
A Cry + Ae Cre + 31 C3 + An C4 = 1 0 0 0 
(10) Ayo Cri + Ase Cro + Age Crg + Ag Cra = 0 ] 0 0 
Qy3 Cry + og Cro + O33 Crag + A434 = O 0 1 0 


O14 Cry + Aog Cro + O34 Cra + a4 Cr 





Fisher [11; 150] has shown that the equations (9) could be solved by reducing the 
unit matrix on the right. One of the authors has shown how to calculate the 
inverse of a symmetric matrix by Gaussian methods without reducing the unit 
matrix [1]. We now show how to reduce the non-symmetric matrix similarly. 
By the same process used in getting from matrix (1) to matrix (2), we can reduce 
the 4° equations of (9) to the 4’ auxiliary equations below. 


k=1 k=2 k=3 k=4 


QnCx% + di Cr+ aig Cxrtauy Cx = 1 0 0 0 

zg.1 Cok + 23-1 Coe + On. Cr = * I 0 0 

(11) 33. (2) Ck ok A34.(2) Cake = * * : 0 
G44.(3) Ce = = * ’ - 1 


The terms marked * can be computed by the process. However if we do not 
compute these terms we have ten equations with the right hand terms either 
1 or 0. 







INVERSE OF A MATRIX 263 


In a similar way the 4° equations of (10) can be reduced to the 4’ auxiliary 
equations below. As above we may neglect the calculation of the diagonal 
terms, and of all terms below the diagonal, and still have six equations (with 
terms on the right zero). 


r=z=lrzw2r=3 r=4 
Cri + ber Cre + bs: Crs + ba =Crae = * 0° O 0 
0 
* 


(12) Cro + bse.1 Crs + bie. Cra = ? ” 
Cr3 + bas. (2) Cra a ° 
c * 

r4 


The ten equations of (11) with the six equations of (12) are sufficient for de- 
termining the inverse matrix. Solve (11) for k = 4; then solve (12) for r = 4; 
then solve (11) for k = 3; then solve (12) for r = 3; etc. Each equation can be 
solved completely on the machine to give a value of a ¢,,. 

It should be noted that Gaussian methods are approximation methods since 
they are division methods. For a discussion and treatment of the errors re- 
sulting the reader is referred to papers by Hotelling [9] and Satterthwaite [10] 
to which further reference is made in the next section. 

Different forms for presentation of the results may be used. We suggest 
the following form which presents first the matrix (1), then the terms of the 
matrix (2). The terms of the matrix @’ are then computed by (11) and (12) 
and placed diagonally adjacent to the terms of (2). The transpose of € is used 
so that the check multiplication by a may be most easily accomplished. The 
result of this multiplication which next appears shows that the computed value 
of a is correct to three places. The final matrix of Table I gives the value of 
the adjugate, D, as found by multiplying each element of the inverse 
by (26)(52.308) (39.356) (43.071) = 2,305,300 (to five places). 

It is possible to check the accuracy of the entries of each row and column 
of the matrix (2) separately by using a check sum to the right of each row and 
at the bottom of each column. We have not taken the space to show check 
sums and they are not particularly needed after one gets a little practice with 
the method. In any case aa should be computed as a final check. 

A more general matrix presentation results from the use of (6). The matrix 
equation a€ = & becomes 8t€ = 3 and hence the auxiliary equation becomes 


(13) te =e". 


Now since 8 is triangular with unit diagonal terms and zeros above the diag- 
onal, it follows that $~ also has unit diagonal terms with zeros above the diag- 


onal. Hence we can select == equations from the n’ equation of (13) 


which demand no further knowledge of the entries of 8’. A similar treatment 
of the matrix equation a’@’ = &, t’s’C’ = ¥ and 


(14) v@’ = (¢’)" 


n(n — 1) 


yields i equations involving zero terms of (t’)™’. 


These two sets of 





264 FREDERICK V. WAUGH AND PAUL S. DWYER 


equations taken together in the proper order are sufficient for calculating the 
n° values in the inverse. 

It may be of interest to note that this is also a procedure for calculating 
t*s* when t and 8 are known without the calculation of t’ and 8” separately 
since 


(15) 


5. The method of multiplication and subtraction with division. We now 
present a different method, based upon the work of Hermite [15] and Chi6é [16] 


TABLE I 


Suggested form for calculation 


26 —10 15 32 
19 45 —14 —8 
—12 16 27 13 
32 29 —35 28 


26 —10 15 32 
.02873 — .00696 .01825 — .00283 
73077 52.308 — 24 .962 —31.385 
.02436 .01239 .01440 — .02267 
— .46154 -21765 39 .356 34 .600 
— .02302 01572 .00791 .01991 
1.23077 -78970 — .85753 43 .071 


— .01519 -00419 — .02041 .02322 


1.000 0.000 0.000 0.000 
0.000 1.000 0.000 0.000 
0.000 0.000 1.000 0.000 
0.000 0.000 0.000 1.000 


66231 — 16045 42072 — 6524 
56157 28563 33196 52261 
— 53068 36239 18235 45899 
— 35018 9659 — 47051 53529 


together with important modifications suggested by the work of Dodgson [17]. 
Current presentations of the basi¢ method include the ‘‘method of condensation” 
[18; 45-48] and in compact forms, the ‘‘method of multiplication and subtrac- 
tion” of one of the authors [2; 197-202]. 

In Gaussian methods we divide each element of a column by the leading 
(diagonal) element of that column. In the method of multiplication and 
subtraction we use the leading element as a “pivot’’ forming a number of two- 
rowed determinants. Thus we use the leading elements as multipliers rather 
than as divisors. No divisions are made in this method. This is a very real 
advantage when the elements of the original matrix contain only two (or three) 





INVERSE OF A MATRIX 265 


digits each and when n < 7 (or 5)., In such cases we can use this method to 
compute exactly the values of any minor of the determinant of the matrix and 
even the adjugate itself. 

It is perhaps well to mention here that error control is difficult with division 
(Gaussian) methods. Even if many significant places are carried the errors 
may be significant, cumulative, and difficult to measure. The techniques 
suggested by the papers of Hotelling [9] and Satterthwaite [10] are most useful in 
developing error control in matrix calculation. However, where accuracy is 
important, and when the number of digits is not excessive, there appears to be 
merit in calculating the exact values. 

In the method of multiplication and subtraction, we compute from the matrix 
(1) the following matrix 


ay A a3 °° Cin 
Ag. Agger *** Aon 


(16) / Age-1 A33. (2) eos 


A n2-1 Ans. 2) chi Aevta-t 
where 


= AyArk — AKA 
(17) 


Asx.) = A.A rkel AsA r2+1 
and in general 
Any = AjpcpnAncey — A gecinAricen- 
This notation is similar to that used in connection with Gaussian methods above. 


In the method of multiplication and subtraction with division, we compute 
from the matrix (1) the following matrix: 


ay 33 2° Ge 
Boz +++ Bona 
(18) 


Buen 
where 
Byer = G10 — 1%. 4r 


Boar Brea — BoraBrr 
ay 


(19) Byx.(2) = 


B3s..2) By.) a Bsx.«2) B,3.(2) 


Boo-1 


Byx.«3) _ 


and in general 


B;;.G-y Breg-y — Byeg-y Brj-G_- 
(20) Breet) a rk-(j—1) gk-(j—1) ?rj-(5—1) 


Bj-1,5-1-G-2) 
with B,x.,; and B,x.@) as defined in (19). 











266 FREDERICK V. WAUGH AND PAUL S. DWYER 


In general the method calls for the calculation of entries according to the 
method of multiplication and subtraction but in addition calls for the division 
by the leading element of the second preceding row or column. Since this 
division must be exact, as is shown in the next section, we have at each stage 
a good numerical check on the work as well as an exact value of the entry. Fur- 
thermore it is shown in the next section that the value of B,,.;;) is the exact value 
of the determinant 





Qa Ae ag Qj Ar 

| @o1 Go. Gog *** Go; Aax 
| @s1 sz Mas +" O35 se | 
(21) [ett e eee e teen eens | 
[Gn Ap ay Qj; Aj! 


| Gry, Go Arg°** a,j ark | 


All the recorded entries (themselves values of determinants) are calculated on 
the machine. The only limitation is the number of places the machine provides. 
For the trivial problems (composed of small integers) found in most texts of 
College Algebra, one can calculate the values readily without machines. For 
example the determinant 


| 2 1 —3 4 |} 2 1 —3 4 
| 3 2 2 1 / 38 1 13 — 10 
|-2 -1 1 3 yields at once - 0 —2 7 | 
| 4 -—8 2 1 4 --10 73 —397| 








and the value of A is —397. All the other entries are also minors of A. 

Dodson introduced a method of multiplication and subtraction with division 
as early as 1866 [17]. He however used a moving pivot. For our purposes it 
seems preferable to use a fixed pivot as we suggest in this paper. 


6. Proofs of theorems involving the B ,x.,;). 

(a) First theorem. We first prove that the numerator B ;;.¢;-1)Br.jay — 
B jx.¢3~-1)B; j.(5-1) in the definition of B,..;;) is exacily divisible by the denominator 
B j-1,;-1-¢j-2).. To do this we expand the terms of this numerator of (20) with 
the continued use of 


By-1,j-1-G—2 Bre.j—2) — Bj-ayneG—» Be js.g 
j-1,j—1-(j—2) Prk-(j—2) j—1,k-(j—2) Pr, j—1-(j—2) 
(22) By.g-» = a - <— -———— —- 





B;-2,;-2-G-3) 


(which is (20) with j replaced by 7 — 1) and then we multiply and cancel. It 
is found that Bj;1,;-1.:;-2) is a factor of all non-cancellable terms so the exact 
divisibility is proved. 

(b) Second theorem. We next prove that B,,.;;) is the value of the determinant 


INVERSE OF A MATRIX 267 


(21). We illustrate first for 7 = 3 and then give a more general proof. When 
j=3 


12 ai3 Qik 

i i Boy Bog Baza 
ay Bo Bs. Bax-s 
Bar Bra Brrr 


y [Bar Bar Bar| 1 | Bas.) Baz.) 
= a; By. Bgg.1 By. | we fen 'B Bu. = By. - 
Bra Bra Beer} liiaiens _ 


In the more general case we designate the determinant (21) by | a,. | and reduce 
the order by the “condensation” method just illustrated. It is understood 
that the values of B,,... used in the following proof have primary subscripts 
larger than secondary subscripts since the rank of the resulting determinant 
decreases with each condensation 


1 1 
bl = Gai | Baal = ss | Bu. 
| rx, | ai | z-1 | Bi? | (2) | 


(23) 


= —7—5 | Br. meee ws | Ban | = Bay. 
—_ | Mae | oe 
It is to be noted that the first theorem, since each B,;..,;. can be interpreted as a 
determinant by the second theorem, is a corollary of a well known theorem 
[19; 33]. In a conventional determinantal notation it might appear as 


(24) MA jx: 5 = ArA ;; — A, jA jx 


where the first subscripts indicate deleted rows and the second subscripts deleted 
columns. 

(c) Third theorem. We next relate the values of B,x.,;, and the values a,x.; 5 
and b,x.;) . With the use of the second theorem (23) and (8) we have 


Ark: (j) Ark: (3) 
and with the additional use of (4) 


Bu. G aia a eee deez a 
-(j) __ G11 422.1 433. (2) ret. 
(25) 2 a EEO = Bijen 


(26) Bre-cjy _ 411 29-1 33.(2) °° + D55-(5—1) Ork-(H) 
brk Ark: () 
Akk: (3) 


= Brew. 


These formulas may be written in the form 
(27) Bx = B55-¢5-y Ark 


Bye.) = Brae cnbre.¢s 





268 FREDERICK V. WAUGH AND PAUL S. DWYER 


and since B j;.¢j-14 and B41, ;:1.,;, are diagonal terms, it follows that the matrix 
(18) can be obtained from the matrix (2) by multiplication by diagonal matrices. 


(d) Fourth Theorem. A fourth theorem gives explicit matrix formulation 
to these results and shows how the values of the matrix (18) can be used in 
factoring the matrix (1). Now (27) and (28) can be written in the form 


(29) T= Wirt 
(30) = = Miss 


where Wty is the diagonal matrix which multiplies t to get XT and Ws is the 
diagonal matrix which multiplies 3 to get S. The values of the T matrix are 
the values of (18) with r < / while the values of the S matrix are the values of 
(18) with r 2 k&. The diagonal matrix 7 is composed of diagonal elements 
1], dy, Boo. +++ Byajn—t-cn—2y)] While the matrix Mt, is composed of diagonal 
elements [dy , Boo. , Baz.) +++ Ban-(n—y|. The basic matrix factorization equa- 
tion (6) then appears as 


(31) a = MOMS 


It is to be noted that exact values of elements of all these matrices are avail- 
able if the inverse diagonal matrices are written in fractional form, subject of 
course to practical limitations such as number of places of computing machine, 
etc. 


7. Computation of the adjugate matrix. We now present matrix formulas 
which enable one to compute the adjugate of a compactly with the method of 
multiplication and subtraction with division. If (9) is the determinant of a 
and D is the adjugate of a, we have 


at 
8tD 
tt 

M tT 


(32) Ty 


and similarly 


1 ¥ 


avd =|a\J 
ved’ = |\e|\3 
BD’ |a| (t’) 
Mis’D’ = M, | a} (ty? 
(33) SD’ = M,|a| (t’)™. 


The computational procedure in getting the adjugate is very similar to that 
used in getting the inverse in section 4. T and € are triangular matrices while 





INVERSE OF A MATRIX 269 


s' and t’ are the matrices used before. The values of Mill, an, Boer, --- 

Bipana(n-») Wefan, Bor, Bss.@), +--+ Bun-~m—»] and |a| are first computed 

by (18) so that Mt, | a | and Mt, | a | can be calculated. Without further calcula- 
n(n + 1) 


tion we are able to select — ~ equations from the matrix equation (32) 


having known coefficients on the right (ee 1) of which are zero) an we = 
equations from the matrix equation (33) having zero coefficients on the right. 
These constitute the n° equations necessary to determine the n” values of d,x . 
These values of d,,. can all be calculated directly on the machine and, what is 
more useful in discovering caleculational errors, the divisions yielding the d,,. 
must be exact. 

For n = 4+ these n° equations are 


ay dy + Ae doz + ars ds. + ax dy = 

(34) Boo. do. + Bog d3x + Bosr dix 

Baz. (2) d3x. + Bas.c2 dx, 
Buss dy. = 


ad + Aa dy. + asi dig + Qa dy = 
(35) By dry + Bye dig + Baza dr 
— B33.(2) dis + Bag.) dra = 


The process is similar to that of section 4. An illustration for the case n = 4 
is given itt Table II. The matrix of the B’s is directly below the matrix a and 
the calculated values of the elements of D’ (obtained by solving (34) and (35)) 
are placed diagonally in the cells with the B’s. The values of the transpose of 
D are used so that the check, premultiplication by a, is easily carried out. The 
next matrix in Table II exhibits aD = ,|a@|93. The last matrix of Table II 
is a five decimal place approximation to @’ which is obtained by dividing the 
entries of TD’ by | a>. Since we know these are the correct five decimal place 
values of G’, we may compare the corresponding values of Table I to see how 
much those are in error. It should be noticed that the approximation to ©’ may 
be readily carried to more than five decimal places if desired. 

As with the Gaussian methods, it is possible here, also, to check each row 
and column individually by using check sums. 

The work necessary for the computation of the adjugate from the matrix of 
the B’s can be shortened somewhat by the use of the fact that the adjugate is 
composed of the cofactors of the a. Now the cofactors of the four terms in 
the lower right hand corner are d,—1..-1 = Ba—ajn—1-(n—2) 3 dn-tsn = — Ba-ayn-(n-2)} 
dyna = —Bay tm») 3 and di, = Byn-cn—x and these are available from the 
calculation of the B’s though B,,,.,,~2) is not recorded. (See the lower right 





270 


four entries of the B’s and a’s in Table II above). 


FREDERICK V. WAUGH AND PAUL S. DWYER 


With these four values 


immediately available, the use of but n? — 4 additional equations is demanded, 


or this 


26 
19 
—12 
32 
26 
19 
—12 


32 


(1) F. 
[2] P. 
[3] P. 
[4] P. S. 
[5] P. 
[6] P.S. 


[7] P. S. 


[8] P. D 


Ss. 


additional information can be used in checking. 


TABLE II 
Suggested form for computation of adjugate (with check) and then inverse 


—10 
45 
16 
29 


15 
—14 
27 
—35 


32 
—8 
13 
28 


—10 32 

66233 
1360 —816 

56151 
296 47056 

— 53068 
1074 2305327 


— 35013 9659 


2305327 





0 
2305327 
0 
0 


0 
2305327 

0 
-01825 
.01440 
00791 
— .02041 


— .00695 
01239 
.01572 
00419 


REFERENCES 


. Wavau, “‘A simplified method of determining multiple regression constants’”’ 
Jour. Am. Stat. Assn., Vol. 8 (1935) pp. 694-700. 


. Dwyer, ‘‘The solution of simultaneous equations’’, Psychometrika Vol. 6 (1941), 


pp. 101-129. 


. Dwyer, ‘‘The evaluation of determinants’’, Psychometrika, Vol. 6 (1941), pp. 


191-204. 

Dwyer, ‘The Doolittle technique’’, Annals of Math. Stat., Vol. 12 (1941), pp. 
449-458. . 

Dwyer, ‘“‘The evaluation of linear forms’’, Psychometrika, Vol. 6 (1941), pp. 
355-365. 

Dwyer, ‘“‘Recent developments in correlation technique’’, Jour. Am. Stat. Assn., 
Vol. 37 (1942), pp. 441-460. 

Dwyer, “‘A matrix presentation of least squares and correlation theory with 
matrix justification of improved methods of solution’, Annals of Math. Stat., 
Vol. 15 (1944), pp. 82-89. 
. Crout, ‘‘A short method for evaluating determinants and solving systems of 
linear equations with real or complex coefficients’’, Am. Institute of Electrical 
Engineers, 33 West 39 Street, New York City or Marchant Methods MM-182 
Sept., 1941. Marchant Calculating Machine Co., Oakland, Calif. 





INVERSE OF A MATRIX 271 


(9] Harotp Hore uine, ‘Some new methods in matrix calculation, ‘‘Annals of Math. 
Stat., Vol. 14 (1943), pp. 1-34. 


(10) F. E. SarrertuwatrTe, ‘Error control in matrix calculation’’, Annals of Math. Stat., 
Vol. 15 (1944), pp. 373-387. 

(11] R. A. Fisner, Statistical Methods for Research Workers, 5th edition, London, 1934, 
p. 150. 

12) C. F. Gauss, ‘“‘SSupplementum theorie combinationis observationum erroribus minimis 
obnoxiae’ , Gottingen, 1873, Werke Band IV 

(13) M. H. DooutitrLe ‘Method employed in the solution of normal equations and the 
adjustment of triangulation’’, U. S. Coast Guard and Geodetic Survey Report, 
1878, pp. 115-120. 

(14) A. C. Arrxen, ‘‘Studies in practical mathematics. I. The evaluation, with applica- 
tions, of a certain triple product matrix’’. Proc. Royal Soc. Edinburgh, Vol. 57 
(1937), pp. 172-181. 

(15] C. Hermire, ‘‘Sur une question relative a la theorie des nombres’’, Journal de Mathe- 
matique pures et appliquees, (1849) i. Also Oewvres Tome I, pp. 265-273. 

[16] F. Cuxd, ““Memoire sur les fonctions connues sous le nom de resultantes ou de determi- 
nants’’, Turin, 1853. 


17] C. L. Dopeson, ‘‘Condensation of determinants’’, Proc. Royal Soc., Vol. 15 (1866), 
pp. 150-155. 


(18] A. C. ArrTKEN, Determinants und Matrices, 2nd edition, Oliver and Boyd, Edinburgh, 
1942. 


119] M. Bocuer, Introduction to Higher Algebra, (1907) ;>MacMillan Co., New York. 





MULTIPLE MATCHING AND RUNS BY THE SYMBOLIC METHOD 


IRVING KAPLANSKY AND JOHN RIORDAN 
New York City 


1. Introduction. The two subjects in the title have generally been treated by 
distinct methods, an excellent summary of which is given by 8. S. Wilks in 
Chapter X of [13]. For two-deck matching, an appreciable simplification over 
the classical work of MacMahon [7], which seems to underlie the generating 
function used by Wilks [12] and Battin [2], has been shown by one of us [5] 
to follow from symbolic methods. Here we give an elaboration of these methods 
to multiple matching and to runs. 

The basis of the symbolic method in both problems has been given in [6], 
but for completeness a skeleton resume is given in Section 2 below. A new 
point is stressed: the relation of coefficients in polynomials of the symbolic 
method to factorial moments (cf. Fréchet [4]). 

The emphasis for the most part is on showing the expedition of the symbolic 
method in reaching known results, but in several instances new results are 
obtained. 


2. Symbolic expressions and moments. Let A;,--- , A, be arbitrary events 
and let p(Ai,,---,Ai,) denote the joint probability of Ai,,---, Aa; let 
P, be the probability that exactly r of the events occur. Then 


(1) P, = : (=1)',C-2(—1)' p(Ag,, «+» Ag) 


and in particular 
n 


Po = Z 2(—1)‘ p(Ai,, a ae Ai), 


k=0 


or symbolically 
(2) Py = [1 — p(A,)][l — plA2)]--- [lL — p(An)). 


The cases to be studied will be exclusively ones where so-called quasi-symmetry 
holds, i.e., p(Ai, ,--- , Ai) is either 0 or a function ¢ of k alone. In that 
event (2) can be evaluated as follows: suppress all products that vanish, and 
form a polynomial f(E) by replacing each surviving term p(A;) by E. Then 
P. = f(E)do where E is a displacement operator: E“¢y = ¢x . 

The same polynomial f(#) can also be used to obtain P, and the moments of 
the distribution. From (1) we see that P, = f(E)yo, where y, = (—1)C@x . 
Again it is well known (Fréchet [4]) that the k-th factorial moment, defined by 


Mw = Lit - 1)---G@ —k+1)P,, 


272 





MULTIPLE MATCHING 
is also given by 
M wy SS kizp(Aj, Brtrse Ai). 


It follows that the terms of f(£)d¢o are essentially the factorial moments. 
precisely, if 


j(B) = & 8s(—-By, 


(3) Mey = k!Sidk . 


3. Card matching. To avoid complications which add nothing to the funda- 
mental idea, the case of three decks will be considered explicitly. As remarked 
by Battin [2], there is no loss of generality in supposing that the three decks 
have the same number of cards: let them be numbered from 1 to n. Let pijx 
denote the probability that the i-th, j-th, and k-th cards of the three decks are 
matched, that is, all occur in say the /-th place. The condition of quasi-sym- 
metry is fulfilled, the (symbolic) product of k of the p’s being either 0 or ¢ = 
[((n — k)!/n!). 

The simplest problem is to find the probability that there be no triple matches 
of the form (7, 7,7). Since no products of the expression 


i= Pin) (1 o D222) >> (1 — Pann) 


vanish, the answer is (1 — E)"@, in agreement with Anderson [1] (cf. also 
problem FE 589 in the American Mathematical Monthly, p. 512, 1943; solution 
by John Riordan, p. 287, 1944). 

Suppose now that the decks are given compositions in the usual fashion by 
having a; , b; , c; aces respectively, a2, be , ce deuces, etc. We may number the 
cards so that 1, --- , a; are aces, a; + 1, --- , a; + @: are deuces, and similarly 
in the other decks. The probability of precisely r matches among cards of the 
same denomination is then given by 


(4) F(a, , bi, c1)F (a2, be, C2) «++ Yo, 
where 
F(a, b, c) = TL — pix) 


the symbolic product being taken over ranges i = 1, ---, a,j = 1,---,), 
kK=1,-:-,¢. 
A simple combinatorial argument reveals that 


(5) F(a, b, c) = 22(a)1(b):(c).(—E)‘/t! 


where (a), = a(a — 1) --- (a — t + 1) is the Jordan factorial notation. The 
problem of matching arbitrary decks is thus compactly solved by (4) and (5). 





274 IRVING KAPLANSKY AND JOHN RIORDAN 


4. Examples. When decks of explicit structure are in question, the com- 
putation of probabilities and moments reduces to straightforward algebra, as is 
illustrated in the three following examples. 

1. Suppose each of three decks has two suits of two cards each. Then, since 


F(2, 2,2)” = (1 — 8E + 4K’) = 1 — 16E + 72E” — 64E* + 16E%, 
it follows that 
(4!1)°Po = (4!)° — 16(8!)* + 72(2!)? — 64(1!)* + 16001)’ 
576 — 576 + 288 — 64 + 16 = 240, 


and the calculation of (4!)°P, may be set forth as follows: 


576 — 576 + 288 — 64+ 16 = 240 
576 — 576 + 192 — 64 = 128 

288 — 192 + 96 = 192 

64 — 64 0 

16 16 


each column being obtained by multiplying its first row entry by a binomial 
coefficient. These results may be verified readily by direct enumeration. 
2. In the case of three 5 by 5 decks, the polynomial is 


F(5, 5, 5)” = (1 — 125E + 4000E* — 36000E" 
+ 72000E* — 14400E°)° 
= 1 — 625E + 176,250E° — 29,711,250K° 
+ 3,346,063,125E* - - - 
The factorial moments can be obtained using (3). 
My = 625/25" = 1, 
Mo) = 2-176250/25"-24” = 47/48, 
Ms = 7923/8464, 
Mw = 1784567/2048288, 


the first two in agreement with Battin [2]. 

3. The symbolic method can be applied to more intricate kinds of matching, 
as this final example shows. Suppose that the six matches represented by 
(123) and its permutations are forbidden, likewise the six matches represented 
by permutations of (456), and so on in groups of three. Then 


(1 — prs)(1 — prse)(1 — pois)(1 — posi)(1 — Pai2)(1 — Pser) 
= 1 — 6E + 6E° — 2E’, 





MULTIPLE MATCHING 


and so the answer is 
(1 — 6E + 6E° — 2E*)"”. 
The analogous problem for 4 decks has the solution 
(1 — 24E + 108K” — 96E* + 24E*)”*, 
The generalization to an arbitrary number of decks involves the enumeration 


of Latin rectangles, in itself a formidable problem. 


5. Moment formulas. It is possible to deduce from (4) and (5) fairly explicit 
formulas for the factorial moments. Let us define u‘? = (a),(b),(c),. Then 
(5) may be written symbolically as 


F(a, b, c) = 2,u(—E)‘/t! = exp (—uB). 
Writing F(a; , b;, c:) = exp (—u;E), we then have 
Py = exp [—(u + we + ---)E|bo 


(“By 


= D(ur + Ue + °°) i! - Do; 


or finally, if m + 1 decks are being matched, 
(6) Po = 24(—)*(ur + ue + ---)‘/t1 (n)?. 


It is to be borne in mind that after expansion of (u; + ue + ---)' by the multi- 


nomial theorem, the term ujuzuj --- is replaced by uf?us”uf” --- with the 


u’s defined as above. 
By (8), factorial moments corresponding to (6) are given by 


(7) Moy = (ur + ue + --+)'/(n)P. 
Thus in particular 
n"Ma = um + e+ ++: = Vabi--- 
n™(n — 1)"Me = (ui + m+ °-::) 
= Y;a,(a; — 1)b;(bs — 1) «++ + 22i4;0;0;6;5; --- 


the cases m = |, 2 in agreement with Battin [2]. 
In the simple case where m = 1 (two decks), a; = 6; = aand n = sa, we have 
u’” = (a)? and 


(8) (n) Min = (utut ---u)' 


with su’s in the parenthesis. The right of (8) is the multi-variable polynomial 
of E. T. Bell [3], Yi(ya, yo, °°* 5 Ys) With yz = (s)u” and (s) a symbolicfactorial 
such that yy; = (s).u?u, ete. Instances of (8) may be compared with 
Olds [9]. 





276 IRVING KAPLANSKY AND JOHN RIORDAN 


Expanding (8) we obtain 


» 


(n) Moy = (s) ue)’ + Co(s) au? [u? ]? + 
= (sa + (C.(s),.0'" “(a — 19° + 


and, since (s),/(n), >a ‘asn — , it follows that M,, — a’, ie., the limiting 
distribution is Poisson with mean a. As indicated in [6] one may proceed to 
obtain successive terms of an asymptotic series for the distribution. These 
results generalize to the case where Mq) = Saj,b;/n approaches a finite limit as 
n— «©. In certain instances where Ma) — ~«, asymptotic normality can be 
proved (cf. [1] and [8]). 

6. Successions and runs. As shown in [6], enumeration of permutations with 
2 specified number of 2-successions like 12, 42, --- may be accomplished by 
introduction of symbols like qiz , qs. , denoting probabilities that 1 immediately 
precede 2, 4 precede 2, resp. For permutations of objects a; of which are of one 
kind, a2 of a second, --- with a, + a, + --- a, = n, the probability of exactly 
r 2-successions is ([6] p. 914) 


(9) P, = G(a)G(a) --- G(a.)Wo 


with y, = (—1).C.(n — k)!/n! and 


a—l 


G(a) = 2 (a)(a — 1)e(—B)'/t!. 
It is to be noted that in deriving (9), elements of the first kind are numbered 
1 to a,, of the second a, + 1 to a; + a, --- and a succession occurs if either 
z precedes j or j precedes 7 with 7 and 7 in the same set. 

For s = 2, i.e., two kinds of elements, there is a simpler formula due to Stevens 
[10], but for the general case (9) seems to be the only reasonably explicit solution 
known. In particular, for the function F(a, --- , a.) of Mood [8] which enu- 
merates the number of permutations with no 2-successions, we have 


F(a, , +--+ , as) = n!G(a,) --- G(a,)do . 
Factorial moments for 2-successions are given at once by (7): 


(10) M = (v4 + Wm +---+ us)*/(n)1 


with uf’ = (a,),a; — 1);.. 

It is more usual to classify permutations according to the number of runs, 
say 7’, a run consisting of a succession of 7 like elements (¢ = 1,2, --- ). Since 
every 2-succession causes the loss of a potential run, we have 7’ = n — 1, Le. the 
number of runs is » diminished by the number of 2-successions. Factorial 
moments M,, for runs are then given by the usual formula for change of origin: 


t 


(11) Mw = DY (-V Cin — tei Mw. 


7=0 


Examples. 1. Introducing a; for the i-th elementary symmetric function 
of the a’s, 





ered 
ther 


vens 
ition 
enu- 


runs, 
since 
. the 
orial 
‘igin: 


ction 


MULTIPLE MATCHING 


m= a+a+:::a=N, 

O2 = Ad, + A103 + +++ + As10,, 

a3 = 00203 + ---, 
we may derive from (10) and (11) the formula 
(12) My = 1 + 2a0/n 
for the mean number of runs. The variance o’, the same for runs and 2-succes- 
sions, is given by 

2 2 2a2(2a2 — ~ 
(13) ¢ = Ma + Mw — Mw = vaste =) Shs 
For runs of two kinds of elements, formulas (12) and (13) specialize to those 
given by Wald and Wolfowitz [11]. 
2. For runs of elements of a single kind, factors in (9) pertaining to other ele- 


ments are suppressed. Thus if a is written for a; , and terms in a, --- , a. are 
suppressed, (9) and (10) become 


P, = G(a)Yo ’ 
Mw = (a) (a — 1)./(n). ‘ 


Moments for runs are given by 


t 


Mw) sas i (—1)'* Cin — ti: My) = (a)i(n — a+ 1)1/(n)e 


7=0 


in agreement with Mood [8]. 


REFERENCES 
[1] T.W. AnpERsoN, “‘On card matching,’’ Annals of Math. Stat., Vol. 14 (1943), pp. 426-435. 
[2] I. L. Battin, ‘On the problem of multiple matching,’’ Annals of Math Stat., Vol. 13 
(1942), pp. 294-305. 
{3] E. T. Bett, ‘Exponential polynomials,’”’ Annals of Math., Vol. 35 (1934), pp. 258-277. 
[4] M. Frécuet, ‘‘Les probabilités associées 4 un systéme d’événements compatibles et 
dépendants,”’ Actualités Scientifiques et Industrielles, no. 859, Paris, 1940. 
[5] I. KapLansky, “On a generalization of the probleme des rencontres,’’ Amer. Math. 
Monthly, Vol. 46 (1939), pp. 159-161. 
[6] I. KapLansky, ‘‘Symbolic solution of certain problems in permutations,’’ Bull. Amer. 
Math. Soc., Vol. 50 (1944), pp. 906-914. 
[7] P. A. MacManon, Combinatory Analysis, Cambridge 1915, especially Vol. I, pp. 99-114. 
[8] A. M. Moon, “The distribution theory of runs,’’ Annals of Math. Stat., Vol. 11 (1940), 
pp. 367-392. 
[9] E. G. Oups, “‘A moment-generating function which is useful in solving certain match- 
ing problems,’’ Bull. Amer. Math. Soc., Vol. 44 (1938), pp. 407-413. 
[10] W. L. Stevens, “Distribution of groups in a sequence of alternatives,’’ Annals of 
Eugenics, Vol. 9 (1939). 
{11] A. Wap anp J. WoLrow!Tz, ‘‘On a test whether two samples are from the same popu- 
lation,’”’? Annals of Math. Stat., Vol. 11 (1940), pp. 147-162. 
[12] S.S. Wi1ks, Statistical Aspects of Experiments in Telepathy, a lecture delivered to the 
Galois Institute of Mathematics, Long Island University, Dec. 4, 1937. 
[13] S.S. Wiitxs, Mathematical Statistics, Princeton, 1943. 











ON THE POWER FUNCTIONS OF THE E?-TEST AND THE 7°-TEST 


By P. L. Hsu 
National University of Peking 


1. The general linear hypothesis. Every linear hypothesis about a p-variate 
normal population or several such populations having common variances and 
covariances is reducible to the following canonical form [4]: The sample distri- 
bution, when nothing whatever has been discarded from the whole sample, being 


' a P ™ 
(25) | a,,[™ exp {73 2% 2 
j= r 
p n 
(1) (Yir — nir)(Yjr — Dj) — 3 x as; 2d zfs) II dy dz 
ij7= 3= 


(n = p), 


where the 7;, and the a;; are unknown, the hypothesis to be tested is 
H: ne =0 G@=1,---,p;r=1,°->-,m,m <™m). 


It is clear that the y;, (¢ = 1,---,p; r = m+1,---,m) canhave no use. 
Also, the only useful quantities supplied by the set z;, are the statistics 


n 
bs; — dX Zis@jey 
because the remaining quantities may be regarded as a set of angles which are 
independent of y;, and the b;; and which has a known distribution free from any 
unknown parameter in (1), [2]. After discarding the irrelevant y’s and the angles 
there results the reduced sample distribution 


p 
| 13(ny+n) 14(n—p—1l) 
K | 03 [P| Be; | exp \-! ae O55 


t,j=1 


nL Pp 
2 (Yir ane nir) (Yr — Nir) — i. a cusbs II dy db. 
rT 7= 
Hereafter the indices 7, j and r shall have the following ranges: 
4,j = 1,°--:,D, r= l,---,m, 
and the convention that repetition of an index indicates summation will be 
adopted. Writing 


Qij = YirYir» Cij = Ais + b;;, 
we obtain the distribution of the y;, and the cj; : 


(2) K | a5 |*"*” Les; — agg |??? 
exp (—JaiiCiz + aiyirnie — Fax jnirnjr) II dy de. 
278 











iT 


riate 

and 
istri- 
eing 


are 
any 
zles 


be 


(ne nee re ete 


POWER FUNCTIONS 279 


In the remaining two sections of this paper we deal exclusively with the 
special cases p = 1 and m, = 1. According as p = 1 or m = 1 we shall drop 
the indices 7 and 7 or the index r. 

The case p = 1. When p = 1, (2) reduces to 


Kol" *(c — yy)” * exp (—}ac + ayn. — ann) dell dy. 
Putting y, = c'x, we obtain 
(3) Kod tht —1y _ 27)? exp (—4ac + ac'x,n, — san,-n,) dclidx. 
The hypothesis H is now 
H: 29» = O (@ = I1,---,m). 
If w is any critical region for the rejection of H’, denote by w(c) the cross 


section of w for every fixed c. Then the power function of w is 


Bwo(n, a) = Bo(m, "" "5 Nni» a) 
(4) 


= Koi” guest pre ae | (1 — x,a,)*"" etter TT dx. 
0 w(c) 

It is known [3] that, in order to have 

(5) B(0, a) = ¢€ 


for all a, it is necessary and sufficient that 
(6) [ (1 — 2,a,)'" "I dx = Ae, 
wi(c) 
where A is a constant. 
The E*-test is the test based on the critical region 
Wo: @ty = co yy, = E” > const. 


The author has proved [3] that of all the critical regions which satisfy (5) and 
whose power function is a function of an,, alone, the region w is the uniformly 
most powerful one. This result is generalized by Wald [7], who proved that, of 
all the regions satisfying (5), the surface integral 


vel, >) = [ Bola, a) dA 


Ir%r 


is maximum when w is w. The author gives here another proof of Wald’s 
theorem which is easier as it dispenses with the somewhat intricate Lemma | 
of Wald. From (4) we have 


eo 
2 a a 
Yu(a, d) _ Kal | ew” n) lo gac de 
0 


. | (1 = 2,2," 11 dx | exp — anrn- + ac’ 2, nr) dA. 
wc) MN p=r 














280 P. L. HSU 
By means of a rotation in the space of (m, --- , ,) we can obtain 


[ _- (—san-nr + ac’, n;) dA 
9r= 


Se 


= [ exp (—43af-¢, + ac’ (a, 2r)' &1) dA = Z. ai, a" (cx, 2;)* 
Jeetp=h k=0 


where a; depends only on a, k and X. Hence 


velar) = De [ erste de fat — 2-2)!" dr, 
k=0 0 w 


(c) 


where b; depends only on k, a and \. Since w(c) satisfies (6), it follows from a 
lemma of Neyman and Pearson [5] that 


/ (x,a,)*(1 — a,2,)'" 1M dx 
wc) 


is maximum, for all ¢ and k, when w(c) is the region 2,2, > const., i.e. when w 

is itself the region x,x, > const. This proves Wald’s theorem. t 
Still another optimum property of the E’-test may be established on using | 

the volume integral instead of the surface integral. This is stated in the follow- f 

ing theorem. F 
THEOREM 1. Let S be any linear set and let 


' 
Y % 
cela, S) = | —_Beln, a)Il dr. 
nrnreS 
Of all the regions satisfying (5), the region wo has the maximum ¢..(a, S). 
For, by the same computation which leads to (7), we easily obtain 
eo eo - ; 
Y 2 )-1 _— k 3n—l 

¢v(a, S) = Der | Qe de [ (x,2,)*(1 — 2,2,)'" 1H da, 
k=0 0 w(e) : 
; 
where c;, depends only on k, a and S. Hence the result follows. ' 
This theorem also contains my previous result as a consequence. For, writing | 


Bw(n, a) = flann),  Bwo(n, a) = folan-n,), 


we have 





F 3ny i 
0 < l.. (Jolons nr) = fan, nr) II dn = fa [e . *(fo(at) _ f(at)) at. 





Since S is arbitrary, we must have f(at) < fo(at). 
The case n} = 1. When n, = 1, (2) and H become respectively 


14(n+1) | 4(n—p—1) 
K | ai; |’ les3 — yy | 


(8) | 




















exp (—faii; + aiyying — 2aijnin,)II dy de, | 
HH”: , =0 (¢ = 1,---,p). 












vhen wf 


1 using | 


follow- , 


‘ 
i 
. 
f 


LRP ETT We ae 


sa 


Titing 


dt. 


ly de, 


POWER FUNCTIONS 281 


There is a unique real matrix 


ti 
to le 


T= ! Steere ee eens (t;; > 0; zeros above the principal diagonal) 
lip lop top 
such that [c;;] = TT’[2]. Introducing the new variables x; , --- , x» by means 
of the transformation 
(9) [Yrs °°* > Yel — [t1,°°°  LplT’ 
with the Jacobian |T| = | ¢;:; |? we obtain the distribution 


f(z, clldrde = K | Oj pate | Ci; pies be rau)? 
(10) 
‘exp (— 3aijciz + aigtestens — Zoaigning Il dx de 
(K=1,--+-,pD; tii = 0 when k> 2). 


If w is any region, we write 
Bw(n, a) _ Bw(m, °°* 5 Mp, Ai1, Giz, °° *, App) wai / f(a, c)II dx dc, 
m 


so that 6,(n, a) is the power function if w serves as a critical region for rejecting 
H"’. We have, symbolically, 


w= DX w(c), 
where D is the set of points (c;;) for which [e;;] is positive definite and w(c) is 
the cross section of w for fixed c;;. Then 
Bu(n, a) -_ K| enh" tesinens | len? eo 2858s TT de 
D 


: [ @ saa 2;2,)°°? » ettitkiteIi TT da. 
w(c) 


It is known [6] that, in order to have 
(11) Bw(0, x) = « 


for all a;;, it is necessary and sufficient that 


[ (1 — 2,2)? 1 dz = Be, 
wie) 
(a2) ~"* 
where B = (1 — 2,2," ?? I dz. 


tizi<l 
The 7°-test is the test based on the critical region 


wo: rai = c’yy; = T’/(1 + T°) > const., or T” > const., 
J 








282 P. L. HSU 


where c’? is the general element of [c; ij)’ and T” is, except for a constant factor, 
Hotelling’s generalization of ‘“Student’s” ratio. 

In order to establish an optimum property of T° analogous to that of E” given 
in Theorem 1, we define, for any linear set S and any region R in the sample 
space, 


ve(S)= [| a(n, a) dy de. 
ag jninges 
W,(S) does not necessarily have a finite value, and it is this fact which renders 
the following theorem less satisfactory than Theorem 1. 

THEOREM 2. Let p, be the smallest latent root of [c;;] and let E be any subset 
of D in which p, is at least equal to a fixed positive constant. Of all the critical 
regions w which satisfy (11), the region wo has the maximum Ww x(S). 

In order to prove this theorem we need the following two lemmas. 

Lemma 1. [f c¢ is a positive constant, the integral 


i aa / | Ci; [to ay de 
p 


pace 
has a finite value. 
Proor. Let pi, --- , p, be the latent roots of [c;;] in the descending order 
of magnitude. From a known theorem [1] we get 


I=C (p. -++ py) 7 et (po: — p;)II dp 


C<Pp<---< Pi <m 
20 « Pp j 
me i+ 
<c [ | (TI 0: "Yes +> dey 
c c i==1 


Hence J is finite. 
LEMMA 2. 


(13) Yuor(S) = Dox | owl Ore de [ (1 — a,2;)"" (w, @,)" I da 
k=0 E 


w(c) 
and Wx(S) is finite, where g; depends only on k and S. 
Proor. Let A be the set of points (a;;) for which [a;;] is positive definite. 
By (8), we have 
Vor(S) = K | leis; — yy; |??? Ot dy de | bovis (2? eH ei*85 TTT dex, 
wk 4 
where 


J= / exp (—Zaijning + ajysnj)IL dn. 
a sjninges 


There is a real non-singular matrix G = [g;;] such that [a;;| = GG’. Using the 
transformation 


Im, +--+, mlG = [f1,--- , Spl, 





Or, 


ven 


ple 


ers 


sel 
cal 


- 


he 


POWER FUNCTIONS 283 


whose Jacobian is | G |~' = | a;; |, we have 
J = ai5|° l lai exp (—355 5; + 9565s Ys) dq. 
sees 
This is reducible by means of a rotation to 


Jj = | ai; 3 [ exp (—47; T; a. CAA y;)'7)1 dr 


(14) — ” 
= | ai; [7 > dios yi ys)’, 
where 
_ i | gi gwar [- [2 “Wert ge... - (2m)? 
"™ (2k)! Jee ies 7s aT a i ae 


and d, depends only on k and S. Hence 


oo 
[ aw\""™ eo feeitei lide = Zz dy Ix , 
JA k=0 


where 
(15) i, = [ \exsl (ass ye ys) oT da. 
Now 
=n it ) a 0” 
where 


f(t) = | | as; [ eo edi Ptuin pei; ll da = K, | les — Qty: y; | | —}(n+p+l) 
4 


Ky | Ci; p Meteor Qict Yi Ys \atety | 


Hence 
(16) Ty, = ey | 45 At? (chyy)*, 


where 











284 
Hence 


Yor(S) = K > dy. ex, [. | “— lc — Yi; m-?- (4 y, y;)* II dy de 


= > o | | o:; |??? II de / (1 — aa,)"-?” (x; 2,)* I dz, 
k=0 E w(c) > 
where g; = Kyid,e;, depends only on k and S. 
Now 


[ (i —- 2,2)” ” (a; 2,)" Il dx < II dz, 
wi(c) 


Zizi<l 


; 1—(p+4 ; — 1 
/ | cs; | F » 1 de < | :; | 3(p+}) Il de 
Ez 


i 
Ppae>od 


is finite by Lemma 1. Hence 


oo eo 
Yur(S) < const. >> d,ex = const. }, —_“___— 
k=0 k=0 (k!)? 
and so g»z(S) is finite. This proves Lemma 2. 

Proof of Theorem 2. Since ¥wz(S) is expressible as (13) and is always finite, 
it follows from (12) and the Neyman-Pearson Lemma that y,,.¢(S) is maximum 
when w is w). This proves Theorem 2. 

Simaika [6] proved that of all the critical regions w which satisfy the conditions 
(a) Bw(O, a) = e for all aj;, 

(b) Buln, a) = flaiinin;), 
Wo is the uniformly most. powerful one. Strangely enough, this result cannot 
be deduced as a consequence from our Theorem 2. 

The difficulty in dealing with the integral y,.(S) is that it is not always finite. 

In order to have a finite integral let us consider the following: 


r,(0, S) = / e iit Bn, a) II dn da, 
@ i jninjes 


where [6;;] is a positive definite matrix. As an immediate consequence of 
Simaika’s theorem we have . 
(17) r,(0, S) < Iy,(8, S) 


wg 
for any region w satisfying (a) and (b). Now the question arises whether 
(17) remains true if the condition (b) on wis removed. The following theorem 
answers this question in the negative. 

TureoreM 3. Let [6;;] be a positive definite matrix, [p;;| = [e:; + 4:3] | and 
Ar, °**, Ap be the roots of the equation | c:; — 0:;;| = 0. There is a function 
g = g(\1, °°: ,Ap) such that the region 


Wi: piiys 2 gr, +++, Ap) 


satisfies (a) and has the maximum I ,,(6, 8). 





POWER FUNCTIONS 


Proor. From (10) and (14) we obtain 


«2 


r.(@, 8S) = K > dk | les — ysy; Or” ? MI dy de 


k=0 
I, | cxss | Caszye ys) e888 TT da, 
Comparing the inner integral with (15) and using (16) we get 
(4, S) Sof | 55 B45 AFP | cs3 — ys ys Ph? (ois yeys)" MI dy de 


(18) dX 9k [ | Ci; + 055 poe | Ci; pons II de 


[ i= 2; 2; ? (552,23) 1 dz, 
wie 


where y:;t:v; is the result of applying the transformation (9) on p;jyiy;. We 
shall show that, for every fixed set of c;;, a unique number g = g(Ai, --- , Ap) 
exists such that the region pijyiy; = yijtix; > g satisfies (12), i.e. 


(19) (1 — 1,2; Ildx = Be. 
Yijzizj=o 


Since [yi = T’leiz; + 6:;) 7, the latent roots of [y;,] are \s/(1 + A) @ = 1 
--+,p). Hence by a rotation the equation (19) is reduced to 


’ 


(20) (l — & sree Tl dé = Be. 


iia 
As g increases from 0 onwards, the left member of (20) decreases steadily from 
Bto0. Hence there is a unique g = g(\i, --: , Ap) which satisfies (20). 

For this g(i1,-°-:,A,) the region w; satisfies (a). Hence, applying the 
Neyman-Pearson Lemma on (18) we obtain the result. 

From Theorem 3 we learn that there actually exist other exact tests for H”’ 
which have some optimum property not possessed by 7”, viz., the tests based 
on the critical regions w; corresponding to various values of the 6;;. However, 
the great difficulty in numerical computation prohibits their application and the 
T’-test stands out as the only test which is both simple and good. 

REFERENCES 
[i] P. L. Hsu, ‘‘On the distribution of roots of certain determinantal equations,’’ Annals of 
Eugenics, Vol. 9 (1939), pp. 250-258. 
[2] P. L. Hsu, ‘‘An algebraic derivation of the distribution of rectangular coordinates,’ 
Proc. Edin. Math. Soc., Vol. 6 (1940), pp. 185-189. 
[3] P. L. Hsu, ‘‘Analysis of variance from the power function standpoint,’’ Biometrika, Vol. 
32 (1941), pp. 62-69. 


[4} P. L. Hsu, ‘‘Canonical reduction of the general regression problem,’’ Annals of Eugenics, 
Vol. 11 (1941), pp. 42-46. 





286 P. L. HSU 


[5] J. NeyMAN AND E.S. Pearson, ‘‘Contribution to the theory of testing statistical hypoth- 
eses,’’ Stat. Res. Mem., Vol. 16 (1936), pp. 00-00. 

[6] J.B. Srarka, “On an optimum property of two important statistical tests,’ Biometrika, 
Vol. 32 (1941), pp. 70-80. 

[7] A. Waxp, ‘‘On the power function of the analysis of variance tests,’’ Annals of Math. 
Stat., Vol. 33 (1942), pp. 434-439. 





SOME GENERALIZATIONS OF THE THEORY OF CUMULATIVE SUMS 
OF RANDOM VARIABLES 


By ABRAHAM WALD 


Columiia University 


1. Introduction. In a previous paper [1] the author dealt with the following 
problem: Let {z:} (¢ = 1, 2, --- , ad inf.) be a sequence of independently dis- 
tributed random variables each having the same distribution. Let a be a given 
positive constant, b a given negative constant and denote by n the smallest 
positive integer for which either 


(1) aAtes-+a2a 


or 
(2) ate ta<b 


holds. The main problems treated in [1] were: (1) Derivation of the probability 
that the cumulative sum reaches the boundary a before the boundary b is reached ; 
(2) Derivation of the characteristic function and the distribution function of n. 

In this paper we shall consider the following more general problem: Let K = 
{ki(z1,-°++, 2} @ = 1,2, --- , ad inf.) be a given sequence of functions and let 
n be the smallest positive integer for which either 


(3) Kn(er ’ arr ’ Zn) = 1 
or 
(4) kin (a eer Zn) s —1 


holds. No restrictions are imposed on the sequence K except that it must be 
such that the probability that n < © is equal to one. The purpose of this 
paper is to derive some theorems concerning the probability that k,(z1, --+ , Zn) 
> 1 and concerning the expected value of n. Obviously, the problem formulated 
here is a generalization of that considered in [1], since the latter can be obtained 
a+b 


a—b 


eee. 2 
by putting ki(ai, --- , 2) = —e (mn + +--+ +2) = 


2. The conjugate distribution of z. Let z be a random variable whose dis- 
tribution is equal to the common distribution of z;. In this section we shall 
introduce the notion of the conjugate distribution of z which will be used later. 
According to Lemma 2 in [1], under some weak restrictions on the distribution 
of z there exists exactly one real value ho ¥ 0 such that 


(5) E(e™*) = 1 


where E(u) denotes the expected value of u for any random variable wu. 
287 











288 ABRAHAM WALD 


For simplicity we shall assume that z has a continuous distribution admitting 
a probability density everywhere, or that z has a discrete distribution. By the 
probability distribution f(z) of z we shall mean the probability density of z, if 
the distribution of z is continuous. In the discrete case f(z) will denote the 
probability that the random variable takes the value z. From (5) it follows that 


(6) f*(z) = e f(z) 


is a probability distribution.. We shall call f*(z) the conjugate distribution of z. 
For any random variable u we shall denote by E*(u) the expected value of u 
under the assumption that the distribution of z is given by f*(z). The expected 


values E(u) and EH*(u) may depend on the sequence K = {ki(z,---, 2;:)} 
(¢ = 1,2,---,adinf.). Occasionally we shall put this dependence in evidence 





by writing E(u | K) and E*(u | K), respectively. 

3. Two theorems. In this section we shall derive two theorems. The first 
theorem is concerned with the probability that kn(z@, +--+, 2.) > 1 and the 
second theorem with the expected value of n. In what follows the operator L, 
will mean conditional expected value under the restriction that h,(z1, +--+ , zn) 
> land £, will mean conditional expected value under the restriction that k,, 
(21, °*+,%n) < —1. If the distribution of z is given by f*(z), these conditional 
expected values will be denoted by the operators E 1 and E> , respectively. 

TuHeoreM 1. Let K = {ki(zi,--- , 2:)} be a@ sequence such that the probability 
that n < © is equal to one under both distributions f(z) and f*(z). Let y denote 
the probability that kn(z1, +--+, Zn) = 1 when f(z) is the distribution of z, and let 
y* denote the probability of the same event when f*(z) is the distribution of z. Then 





* oa ok 
(7) Ey(e*"|K) =; — By(e*"*|K) = +7 
Y =F 
and 
' 
*(o-Bnhol K) = 1. *(g-Znho| K) = 1-7 
8) Be" |K) = 75 Be K) = 7, 
where Zn = 2 tere t+ 2. 
Proor: From (6) it follows that ° 
¥*(z1) --+ f*(Zn) 
9 genre — J) ++ S'@n) 
” fle) ->> fn) 
and 
(10) evanho _ Sls) «++ f@n) 








St) ++ F*En)’ 


A set (21, ++ , Zn) will be said to be of type 1 if and only if —1 < kn(a,---, 
Zm) < lform = 1,--:,n — Land k(a,---:, 2.) > 1. Similarly a set (z,, 
+++ 2,) will be said to be of type 2 if and only if —1 < kp(z., +++ , 2m) < 1 for 
m=1,-:-,n—landki(a,-++, 2) < —1. 





CUMULATIVE SUMS 289 


We shall prove Theorem 1 under the assumption that the distribution of z is 
discrete. Because of (9) we have 


1 (ponho\ We — S*(@1) +++ f*@n) ‘) = 
(11) Ai(et"'*| K) i fla) +++ fn) | x 


me, Se) +++ Fn) 
oe, SG) + Hen) 





where the summation is to be taken over all sets (2:1, --- , zn) of type 1. But 
* 
, : ; Y , , , 
the last expression is obviously equal to — and, therefore, the first equation in 
a 


(7) is proved. ‘The second equation in (7) follows in the same manner if we take 

into account the fact that the probability that n < © isequal to one. Similarly, 

equation (8) can be obtained from (10). The proof can easily be extended to 

the case when the distribution of z is continuous. Hence, Theorem 1 is proved. 
THEOREM 2. Jf Ez ¥ 0, the relation 





E(Z,, | K) 
E(n| K) = —— 
(12) (n | K) Ez 
holds for any sequence K = {ki(ai, --- , 2:)} for which one of the following two 


conditions is fulfilled: 
(a) There exists an ynteger N such that the probability that n < N is equal to one. 
(b) E(n| K) < © and the first four moments of z are finite. 

ProoF: First we shall show that condition (a) implies the validity of (12). 


For any integer 7 we shall denote z; + --- + 2; by Z;. Since the probability 
that n < N is equal to 1, we have 
(13) E(Z, | K) + E(@ngi + --+ + ew) = EZy = NEz. 


Since the conditional expected value of (2n4; +--+ + zy) for a given value of 
n is equal to (N — n)Ez, we have 


(14) E(@n41 + +++ + 2v) = E(N — n| K)E2 = NEz — E(n| K)Ez. 
Kquation (12) follows from (13) and (14). 

Now we shall show that condition (b) implies (12). Denote by Py the prob- 
ability that n < N. Let the operator Ey denote conditional expected value 


. . + / °,¢ 
under the restriction that n < N, and let the operator Ey denote conditional 
expected value under the restriction that n > N. Then we have 


(15) PyEy(Zy) + (1 — Py)Ex(Zy) = E(Zy) = NEz. 


Since 


Ey(Z, | Kk) + Ey (Zn41 + ae + 2N | K) 
Ex(Z, | K) + Ex(N — n|K)E2 
Ex(Z, | K) + NEz — Ex(n| K)Ez, 


(16) Ey(Zy) 











290 ABRAHAM WALD 





we obtain from (15) 
(17) Py{Ey(Z, | K) + NEz — Ex(n| K)Ez} + (1 — Py)Ex(Zy) = NEz. 
From E(n|K) < «= it follows that 

(18) lim (1 — Py)N = 0. 


Now we shall show that (18) implies the validity of 


(19) lim (1 — Py)Ex(Zy) = 0. 
N=0 
Let Ty = Zy — NkEz. Because of (18), (19) is proved if we can show that 
N=00 
Denote by Ry the set of all points (21, --- , zy) for which »n > N. Then the 
probability measure of Ry is equal to 1 — Py and 
(21) (1 = Pyx)Ey(T'y) = i T wf (21) Ries f (zn) dz, ares dzyx . 
Ry 


Let Ry be the part of Ry in which Ty < —.N, R*. the part of Ry in which Ty > N 
and R*, the part of Ry in which —N < Ty < N. Because of (18) we have 


(22) lim [. T'f(@i) «+ f(@n) dtr +++ dzy, S lim (1 — Py)N = 0. 
N=o | JR hes 
Denote the cumulative: distribution function of Ty by Fy(Ty). Clearly, 


(23) iF T wf (21) ie Flew) dz, --- dzx $ | Ty dFy (T'y) < wif Tx dF’ y(Ty). 
z N l N 


’ 


e > . . a . N 
Since the first four moments of z are finite, the 4-th moment ot Ji 


N 


converges 


s 4 . . . ‘ 
to 3o0 where o is the standard deviation of z. Hence 


. — 1 7 7 (ryy 
(24) lim [ — TdF v(T'y) = 30°. 
N=ao %—0o N? 
From (23) and (24) it follows that 
° , 
(25) lim a Twf(z) --+ fen) dz, -+- dzy = 0. 
N=0 /R, 
Similarly we can prove that 
(26) lim | | Trfles) «++ fen) dar «++ dey = 0. 
N=00 Ry 


Kquation (20) follows from (21), (22), (25) and (26). Hence (19) is proved. 
From (17), (18) and (19) we obtain 


(27) lim Py{Ex(Z,|K) — Ex(n|K)E2} = 0. 


N=c 











CUMULATIVE SUMS 291 


Since Ez ¥ 0, lim Py = 1, lim Ey(n| K) = E(n | kK) and lim Ey(Z, | K) = 
E(Z, , K), equation (12) follows from (27). Hence condition (b) implies (12) 
and Theorem 2 is proved. 


4. Lower limit of E(n | K). In this section we shall derive a lower limit for 
E(n, kK). First we shall prove the following lemma. 
Lemma 1. For any random variable u we have 


(28) ek) < Ke". 


Proor: Inequality (28) can be written as 
(29) 1 < Ee“ 


where u’ = u — Ku. Lemma 1 is proved if we show that (29) holds for any 
random variable u’ whose mean is zero. Expanding ec" in a Taylor series 
around uv’ = 0, we obtain 


9 
,* 
< , 
elu") 
a% 


where 0 < &(u’) < w’. 
Hence 
Ke" = | + LE es" > | 


and Lemma | is proved. 

Now we are able to prove the following theorem. 

THEOREM 3. Let K = {Ki(ai, +--+ , 2:)} be a sequence of functions such that 
the probability thatn < x is one under both distributions f(z) and f*(z) of z. Let 
y be the probability that K,(z:, +--+, Zn) > 1 when f(z) is the distribution of z, 
and let y* be the probability of the same event when {*(z) is the distribution of z. 
Then 


30 a ee oo ee ee 
(30) on ae 7 | 


and 


, , l y* 1 
(31) E*(n\ K) > — «| y* log — + (1 — y*) log 
ho Ez v 
provided that Ez and Kz* are not equal to zero. 

Proor: First we shall prove Theorem 3 in the case when there exists an integer 
N such that the probability that n < Nisone. According to Theorem 2 we have 

° ‘~ E Fin | K 1 7 r, 7,7 f 7 | ” 

(32) E(n| K) = = P Fg Ean | K) + (1 — 7)F.(Z, | K)). 


From Lemma 1 and Theorem 1 it follows that 


—ae 7* die ial i- > 
(33) hy E\(Z,|K) < log — and ho E.(Z, K)< log =; 
7 











292 ABRAHAM WALD 


From (32) and (33) we obtain 


ho Ez E(n | kK) = holy Ei(Z, | K) 


(34) : y* 1 — 4 
+ (1 - 7) E2(Zn | K)| <y¥ ~—. — + (1 — y) log <7 : 


Inequality (30) follows from (34) if we can show that a (2) <0. From Ee” = 
| and Lemma 1 it follows that hhE(z) < 0. Since ho ¥ 0 and E(z) ¥ 0, we must 
have hoH(z) < 0. Hence (30) is proved. To prove (31) we proceed as follows: 
From Theorem 2 we obtain 


(35) —hoEz* E*(n |AK) = —holy*E} (Zn |K) + (1 — y*)E, 2 (Zn | | K)}. 
From Lemma 1 and Theorem 1 it follows that 


—holy*E} (Zn | K) + (1 — y*)E2 (Zn | K)I 
(36) 


From (35) and (36) we obtain 





Qn Tk * r * 1* x , = 
(37) ho E*(2)E*(n| K) > ¥ ma +t - ee: 


Since E*e “*? = 1 it follows from Lemma 1 that —hok*z < 0. Inequality (31) 
follows from this and (37). Hence Theorem 3 is proved in the special case when 
there exists an integer N such that the probability that n < N is equal to one. 

To prove Theorem 3 in the general case, for any integer N let the sequence 
Ky = {kin(zi, --- , 2:)} be defined as follows: kin(z;, +--+ , 2:) = kilt, +++ , 2) 
fori < N and kiy(ai, +--+: , 2:) = lfora > N. Denote by yx and yx the values 
of y and y*, respectively, if the sequence K is — by Ky. Then we have 


(88) E(n|K) > E(n| Ky) = he Ei make log 78 + (1 — gy) en * _— =| 


N 


and 


(39) E*(n|K) > E*(n| Ky) 


IV 


N 


i — YN 
ih Ez [x log" E+ (1 — yx) log + =a } 


Since lim yx = y and lim YN = y*, inequalities (30) and (31) follow from (38) 
N=00 


and (39). Hence the proof of Theorem 3 is completed. 


5. Remarks added in proof. ‘The results obtained in the present paper have 
obvious applications to sequential analysis. These applications are, however, 
not mentioned here, because at the time the present paper was submitted for 
publication, sequential analysis constituted classified material. In the mean- 
time, the material on sequential analvsis has been released and was published in 


Re PT GOT 





3) 


ee re: 


CUMULATIVE SUMS 293 


this Journal, June, 1945. The results obtained in the present paper are more 
general than those obtained in connection with sequential analysis. Theorem 3, 
in the present paper, implied the efficiency of the sequential probability ratio 
test discussed in Section 4.7 of the paper on sequential tests. 


REFERENCE 


(1] A. Waxp, “On cumulative sums of random variables,’ Annals of Math. Stat., Vol. 15s 
(1944), pp. 283-296. 











ON THE DESIGN OF EXPERIMENTS FOR WEIGHING AND MAKING 
OTHER TYPES OF MEASUREMENTS 


By Kk. KisHen 


Department of Agriculture, Lucknow, lndia 


1. Introduction. In a recent paper, Hotelling [1] has discussed the basic 
principles of the theory of the design of efficient experiments for estimating the 
true unknown weights of p given objects by means of a specified number N ot 
weighings, p < N in case the seale is free from bias and p < N — 1 if it has.a 
bias the unknown value of which has to be estimated from the same data. He 
has emphasized the importance of these designs in other kinds of measurements 
besides weighing of objects and has called attention to the need for further 
mathematical research for obtaining a ‘‘comprehensive general solution.”” Such 
a solution has now been obtained in case the number of weighings N is at our 
choice. Some other general designs have also been given in this paper for 
specified values of NV and p. 


2. Estimation of unknown weights and efficiency of a design. Using 
Hotelling’s notation, we may write 


p 
(1) E(ya) = >> iad; 
i=l 

where i = 1, 2,--- p, on the assumption that there is either zero bias in the 
scale or the bias is known a priori, and a = 1, 2,--- N. E(yq) is the expecta- 
tion of the ath weighing. For a biassed scale, we may take i = 0, 1, 2, --- p. 
The efficient estimate of each of the b,’s has been derived by Hotelling by the 
method of least squares. It is of interest to obtain these estimates by the use 
of the theory of linear estimation as developed by Bose [2] and Rao [3]. 

Assuming that y:, yo, -°-- yw are N stochastic variates forming a multi- 
variate normal system with the variance and covariance matrix given by 


(2) “i= [u; jl, 


‘'t follows from Rao’s generalization of Markoff’s theorem that the best unbiassed 
estimates of the b,’s are given by the solutions of the normal equations 


(3) U's = TUN’, 


where B = [bibo --- bp] and Y = [yyye--- yw], and B’ and Y’ denote as usual 
the transpose of the row vectors B and Y, i.e. column vectors. 

In the present case, the assumption is that all the N stochastic variates are 
uncorrelated and have a common variance a’, so that 


(4) o* « —%. 


294 






















DESIGN OF EXPERIMENTS 


Hence the norfial equations in (3) reduce to 


(5) X'XB’ = X’Y’, 


which are exactly the same as the normal equations given by Hotelling, since 
(6) X’X = [a;,| 


where @;; = S(Xiat ja) 

Let C = fe;;] denote the reciprocal of the matrix X’X, so that V(b) = ej,0° 
and cov (bjb;) = e;j0. Then the mean variance of the p unknowns for a design 
is given by 


Pp 
= 2 N >. Cit 
(4 ) o {ul 


Um = N p 
If the main object of the experiment is to estimate the unknowns with the 
least variance, the most efficient design (for a specified value of NV) would be 
‘ . = i > Shar « ° : 
the one for which the minimum minimorum of o°/N is attained for all the p 


Pp 
unknowns so that the mean variance in this case is 0” /N. The factor, NV >> e:/p, 
i=1 
on the right-hand side of (7), therefore, measures the increase in variance result- 
ing from the adoption of any design other than the most efficient design. Its 
_ Pp 
reciprocal, N > , may appropriately be defined as the efficiency of a given 
i Ci 


i=l 

design for providing estimates of the p unknowns. This quantity will now be 
utilized for judging the relative precision of the general designs discussed in the 
subsequent paragraphs. 


3. Design for N = 2”, p < 2” (zero bias) or p < 2” — | (non-zero bias). 
By utilizing the properties of a 2-sided m-fold completely orthogonalized Hyper- 
Graeco-Latin hyper-cube of the first order introduced by the author [4], it is 
easy to see that for V = 2”, p < 2” (when there is zero bias) or p < 2” — 1 
(when there is bias), m being any positive integer, a completely orthogonalized 
design can be constructed with each unknown weight estimated with the mini- 
mum variance o /N. As remarked by Hotelling in the case of N = 4, p = 4 
(for zero bias) or p = 3 (if there is bias), the matrix X’X for this design is a 
scalar matrix of order p X p if there is zero bias, or of order (p + 1) X (p + 1) 
if there is bias, each of the diagonal elements being V. The reciprocal matrix 
is also a scalar matrix in which each of the diagonal elements is 1/NV so that the 
estimates of all the unknowns are mutually orthogonal. 

As a particular case of this general design, we may take V = 16, p = 16 (for 
zero bias) or p = 15 (if there is bias), the completely orthogonalized design for 
which is represented by the matrix 











296 K. KISHEN 


(8) X = ; 
fattetrewtt@en1et\_st4atoese.t & 
1-1 1 1 F-1-1-1 1 1 1-1 at-1 1-1 
1 2-1 1 F-l 2 2-t-1 t-1 -1 1-1 -1 
1-1-1 t 1 t-t-1 -1 -1 2 1 1-1-1 1 
1otot-t bt tt t-t t-t1 -1 1-1 -1 -1 
1-1 1-1 1-1 1-1 -1 1-1 1T-t 1-1 | 
1 t-1-1 1-1-1 1 FT-t-1 t-t-l 1 1] 
1-1-1 -1 1 1 3-1 L-t-1-1 1 1 1-1 
1 oto @ot—-t bt d—-t 2-1-1 1 -1 -1 -1 -1 
1-1 1 t-i -l-1t FT t-t-1-1 tf 1-1 1 
1 t-l ft -t -1 2-1-1 F-1-1 1-1 1 1 
}1 -1 -L 2 -t 2P-t P-t t-t t-t 1 1-1] 
1 ot t—-) -1 2-1 -1 -1 -1 1-1-1 «1 106 1] 
1-1 1 -t—-1 -1 1 L-t-1 1 1 tl 1-1 
1 1-1-1 -1 -1-1 -1 1 1 1 4 F 1-1 -1 
‘ —1 -1-1-1 1 1 0 t F t-t -1 -1-1 1 


for which X’NX is a scalar matrix of order 16 X 16, each diagonal element being 
16. Again, a completely orthogonalized design for N = 16, p < 16 (for zero 
bias) or p < 15 (if there is bias) is represented by a matrix X obtained from the 
matrix in (8) by omitting any 16 — p of its columns if there is zero bias, or 
16 — p — 1 of its columns if there is bias. In the matrix X, permutation of 
rows and columns is permissible and each such matrix represents a completely 
orthogonalized design. 

For the design given by Hotelling’ for NV = 4, p = 3 (zero bias), the efficiency 
is 35 per cent. The completely orthogonalized design tor which the efficiency 
is 100 per cent is represented by the matrix 


} t 4 
(9) yar * = 


1 -1 1 | 
; =—— iy 

4. First design for N = 2” + 1, p < 2” (zero bias) or p < 2” — 1 (non-zero 
bias). For N = 2” + 1, p < 2” (zero bias) or p < 2” — 1 (if there is bias), 


m being any positive integer, probably the most efficient design available seems 
to be that represented by the matrix XY obtained from the corresponding matrix 








1 The allusions here and at the end of the next section are to designs on p. 305 of the 
Hotelling paper [1], a passage concerned with designs subject to the restriction that the 


entries on the matrix be 0’s and +1’s only, as is necessary in many types of measurement. . 


The more efficient designs given above, whose matrices involve —1’s also, can be used only 
in such cases as that of weighing in a balance, where the objects under investigation can be 
put, some in one pan and some in the other. Such situations are considered in a different 
part of Hotelling’s paper. 


LAETOLI Ee 


FSAI LEE IERIE NE Ie 


PO Ee RT ER NE TET 


B 
7 
x 





a 





a a ee” 


PTE AT NRL ER EL IE PT 


RPI IT RE TT ONE A 


EET EEE ETE ONT LT IT TOTO Ie CENT NOTE YES BETIS SATE RETIREES URES ET 


DESIGN OF EXPERIMENTS 297 


X for the general design of Section 3 above by adding a row 1, 1, --- 1 to it. 
The matrix X’X for this design then comes out as . 


N 1 1:.:- 1 


1 N 1-1 
(10) xX'X=/1 1 N--- 1, 
1 1 1---N 


which is a symmetrical matrix of order p X p if there is zero bias, or of order 
(p + 1) X (p + 1) if there is bias. The variance of each unknown for this 
design is 


o 
TTC f j 
(11) yp i or zero bias, 
N+p-—2 
or 
o 
eee if there is bias. 
(12) N — p 
N+p-1 
Thus the efficiency of this design is 
(13) 1— senile Mca, for zero bias, 
N(N + p — 2) 
or 
ial p ; tlds 
(14) 1 es | -=% if there is bias. 
The loss of efficiency resulting from the adoption of this design is, therefore, 


—1 
P for zero bias or if there is bias. 


snc teiaelioames 
N(N + p — 2) N(N + p — 1) 

As a particular case of this, for N = 5, p = 2 (zero bias), probably the most 
efficient design available is specified by 


(15) X = 


— eet tt 


1 

1 

1}. 
|] 
—1] 


2 
The variance of each unknown in this case is a and the efficiency of the design 
is 96 per cent. For the design given by Hotelling for this case, the variance of 


. 4o ; / 
each unknown is —— and the efficiency is 35 per cent. It would thus appear 


= 





298 K. KISHEN 


that, as judged by the criterion of efficiency as defined here, the design repre- 
sented by the matrix in (15) is more efficient than Hotelling’s design. 


5. Second design for V = 2” + 1, p < 2” (zero bias) or p < 2” — 1 (non- 
zero bias). Another interesting design for these values of N and p is that 
represented by the matrix VN obtained by adding a row 1, 0, --- 0 to the cor- 
responding matrix Y for the general design in Section 3 above. The matrix V/V 
for this design is then the diagonal matrix 


(16) 


of order p X p (for zero bias) or (p + 1) & (p + 1) (for non-zero bias). As 
the reciprocal of this matrix is also a diagonal matrix, the estimates of all the 
unknowns are mutually orthogonal. The efficiency of this design is 


i =~ ie 


17) — 
me Np - 1 


for zero bias, 


N -1 ; 
(18) . for non-zero bias. 


N 


By comparing the efficiency of the first design given in (13) and (14) with that 
of the second design in (17) and (18) respectively, it would appear that the 
efficiency of the first design is always higher than that of the second design for 
non-zero bias, and is also higher in the ease of zero bias for p > 1, but equal for 
p=. 


6. First design for NV = r, p < 2” (for zero bias) or p < 2” — 1 (for 
non-zero bias). For N = r, p < 2” (for zero bias) or p < 2” — 1 (for 
non-zero bias), m being any positive integer and r any positive integer < 2”, 
a highly efficient design is represented by the matrix VY obtained from the 
corresponding matrix X for the gerferal design in Section 3 above by adding r 
rows 1, 1, --- 1 to it. The matrix X’X for these designs then comes out as 


m + 
P 





DESIGN OF EXPERIMENTS 299 


which is of order p X p for zero bias, or of order (p + 1) X (p + 1) for non-zero 
bias. The variance of each unknown determined by this experiment is 


\= 


a — for zero bias, 
(20) : 


(21) 2 if there is bias. 
i 

N+ (p -— lr 
Hence the efficiency of this design is 


‘ (p -_ 1)r . 
(22) | NIN + (p — Dri for zero bias, 


or 


2 
(23) ] NIN + @— Dr if there is bias. 


The loss of efficiency as a result of adopting this design is, therefore, 
— 3) «a pr 


NIN + (p — 2)r] for zero bias, or NIN +@ pa if there is bias. 


7. Second design for N = 2” + r, p < 2” (for zero bias) or p < 2” — 1 (for 
non-zero bias). Another design for these values of N and p is that represented 
by the matrix X obtained from the corresponding matrix X for the general 
design in Section 3 above by adding to it r rows 1, 0,0, --- 0. The matrix X’X 
for this design is then given by 


which is of order p X p if there is zero bias, or of order (p + 1) X (p + 1) if 
there is bias. Here also the estimates of all the unknowns are mutually orthog- 
onal. The efficiency of the design comes out to be 


(25) (N — 1)p 


— if there is zero bias, 
Np -r 


= 
at if there is bias. 





300 K. KISHEN 


By comparing the efficiency of the first design of this type given in (22):and 
(23) with that of the present design given in (25) and (26) respectively, it would 
appear that in case of zero bias, the efficiency of the first design is higher than 
that of the second design for p > 1, but equal for p = 1; and in case of non- 
zero bias, the efficiency of the first design is always higher than that of the 
second. 


8. Comprehensive general design when N is at our choice. When N is at 
our choice, we can always obtain a completely orthogonalized design by taking 
N equal to a sufficiently large power of 2. For p = 2”, m being any positive 
integer, a completely orthogonalized design for N = 2”, when there is zero 
bias, has been given in Section 3 above. If, however, there is a bias, a com- 
pletely orthogonalized design can be constructed for N = 2”**. When p = 
2” + u, where wu is a positive integer < 2”, a completely orthogonalized design 
is available for N = 2”*', whether the bias is zero or not. 

For N = 2”*', this is the most efficient design, with 100 per cent efficiency, 
but as N is given higher powers of 2 than 2”*’, the variance of the estimate of 


each unknown decreases. When N = 2', where 1 > m + 1, the variance of 


1 
each unknown is gem of that for N = 277, 


REFERENCES 

[1] Harotp Hore uine, ‘Some improvements in weighing and other experimental tech- 
niques,’”’ Annals of Math. Stat., Vol. 15 (1944), pp. 297-306. 

[2] R. C. Boss, ‘‘The fundamental theorem of linear estimation,’’ Proceedings of the Thirty- 
first Indian Science Congress, 1944, Part III. 

[3] C. RapHAKRISHNA Rao, ‘On linear estimation and testing of hypotheses,’ Current 
Science, Vol. 13 (1944), pp. 154-155. 

[4] K. Kisnen, “On Latin and hyper-Graeco-Latin cubes and hyper-cubes,’’ Current 
Science, Vol. 11 (1942), pp. 98-99. 





NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


(ean mR mm 


NOTE ON THE LAW OF LARGE NUMBERS AND “FAIR” GAMES 
By W. FELuer 
Cornell University 


1. “Fair” games. Let {X;} be a sequence of independent random variables 
with the same cumulative distribution function V(x). Suppose that the ex- 
pectation 


(1) E(X;) = [- x dV(x) = M 


exists, and put 

(2) Sx = Xiterr + Xn. 

The weak law of large numbers states’ that for every « > 0 andn — 
(3) Pr {| S, —nM |< en} > 1. 


In the picturesque language of the theory of games this means that, after a 
large number of trials, the accumulated gain S, will, with great probability, be 
of the order of magnitude of nM. This led to the definition that a game is 
“fair” if the entrance fee for each trial is M. Unfortunately this definition 
creates the erroneous notion that a “fair”? game is necessarily fair. To disprove 
it we shall (section 3) exhibit an example which will show: 

(1) A game can be “‘fair’’ and nevertheless such that the probability tends to one 
that, after n trials, the player will have sustained a loss L, = nM — S, of the order 
of magnitude n(log n)~", where n > 0 is arbitrarily small. In other words, in our 
example 


(4) Pr {nM — S, > (1 — e)n(log n)""} — 1. 


Of course, L, is necessarily of smaller order of magnitude than n; however, our 
example can be modified in such a way that the ratio of the loss L, to the ac- 
cumulated entrance fees nM decreases as slowly as one pleases. 

This shows that a “fair” game can be exceedingly disadvantageous. Con- 
versely, an “unfair”? game can very well be advantageous. If a careful driver 
insures his car, the game is clearly “unfair’’ according to definition, and yet some 

1 Usually (3) is proved only under more restrictive hypotheses. Actually the finiteness 


of £(X;,) implies even the strong law of large numbers; cf. KoLmMocororr, Grundbegriffe der 
Wahrscheinlichkeitsrechnung (Berlin 1933), p. 59. 


301 

















W. FELLER 





states impose such games on drivers. Now in this and many other practical 
cases the game is of such » nature that there is a very small probability p of 
winning a comparatively great amount A; the ‘‘fair” price would be pA. In 
such cases the law of large numbers would be significant only if n is large com- 
pared to 1/p, whereas actually the maximum number of games to be played is 
comparatively small. Clearly any theory meets practical requirements only 
if it makes allowance for the number of trials and makes the “fair”? price depend 
on the number of trials. 




















2. The Petersburg “paradox.” For obvious reasons the classical theory of 
probability was unable to provide a precise formulation of the law of large 
numbers and to establish the actual conditions of its validity. Often it has 
been looked upon as a direct consequence of the definition of probability, and 
this led to the so-called Petersburg paradox which presents no difficulties to the 
modern theory. It refers to the case where the expectation (1) is infinite. The 
usual example exhibits « game in which the possible gains in each trial are 
distributed according to 


(5) Pr {X = 2} = 2. 
Here M = x. Now the law of large numbers (3) used to be proved (if at all) 
only assuming the existence of moments of higher order. Nevertheless, the 
classical theory postulated the validity of (3) even for M = x, and treating x 
us a number (with « — x = Q) it argued that ~ is a “fair” price for the game 
as defined in (5). Great ingenuity was exercized in order to reconcile this 
result with commonsense. Actually one can pass from (3) to the limit M— «, 
but the only result to be arrived at is trivial and could be anticipated without 
theory: If the player pays for each trial a fired amount A, he is likely to have a 
positive gain provided he plays sufficiently long, i.e., provided n > N(A), 
where N(A) itself increases with A. 

Instead of a paradox we reach the conclusion that the price should depend on 
n, that is to say vary as the number of trials increases. For best results this 
should be the case even if M is finite. It should be noticed that in the Petersburg 
case (5) a variable price can be determined so that a law of large numbers will 
hold which is in every respect analogous to (3). In this formula nJ/ is simply 
the accumulated amount of entrance fees; denoting it by ?,, , formula (3) takes 
on the equivalent form . 























2 Among the latest textbooks, von Mises (Wahrscheinlichkeitsrechnung, Leipzig-Wien 
1931, p. 108f.) avoids the difficulty by declaring that (5) can not represent a collectif because 
of its infinite tail. This viewpoint is legitimate, but makes the law of large numbers inap- 
plicable to practically all useful distributions. Fry (Probability and its Engineering Uses, 
New York, 1928, p. 197) says: ‘“The true explanation of the paradox is . . . based upon the 
fact that in our every-day experience we have to deal only with individuals who have finite 
fortunes and who would therefore be incapable of paying back the sums which are required 

.’. The problem does not seem to be mentioned in Uspensky’s book. 





LAW OF LARGE NUMBERS 


(6) Pr {; S, — Pa| < €P,} 1. 


It is this interpretation of (3) that leads to the notion of “fair”? games. Now 
the Petersburg game can also be played in a ‘fair’? way: 

(II) Let the player in the Petersburg game (5) at the k-th trial pay the amount” 
logs k. The accumulated entrance fees up to the n-th trial are P, ~ n loge n, 
and the game is “‘fair” in the sense that the law of large numbers (6) holds. This 
requirement determines the entrance fees essentially wniquely (that is to say up to 
terms of smaller order of magnitude which, by definition, remain undetermined). 


3. Proofs. ‘Theorems (1) and (II) follow easily from the following 
LemMa: Let a, — © be a sequence of positive numbers; in order that there exist 
a sequence {b,} such that 


(7) Pr \ i} S,, eae by ne EL, ‘ —— | 


it is necessary and sufficient that for every 6 > 0 simultaneously 


(8) n / dV (x) —> 0, a, n / x dV(x) > 0; 
jz] >day 


|z| <ay 


in this case (8) will hold with 


(9) b, = i x dV (x) 


k=1 “|z|<ax 


(and, of course, for any other sequence {ba} if and only tf | bs — bs | = O(a,,)). 
This lemma is a simple consequence of the necessary and sufficient conditions 
for the generalized law of large numbers’. 

To prove theorem (II) we have to determine a sequence {a,} such that (7) 
will hold for the distribution function defined in (5) and with b, ~ a, . A simple 
computation shows that (8) will hold for any sequence {a,} which increases 
faster than n. Moreover, the sequence {b,} defined by (9) will be of the same 
order of magnitude as {a,} if, and only if, a, ~ n loge n. This proves (II). 

Now let » > 0 be arbitrary, and define the distribution function V(x) to have 
a density 


; ” 0 
(10) V(x) = 2 log” for x > e; 


at x = 0 the function V(x) shall have a jump of magnitude 


= 
way \-[ 3 x log’*” eo 


while V(x) is constant in the intervals z < Oand0 < x <e. For this distribu- 
tion function we have obviously M = 1. 


3 Log, stands for the logarithm to the basis 2. 
‘Cf. Fetuer, Acta Univ. Szeged, Vol. 8 (1937), pp. 191-201. 








304 GERHARD TINTNER 


Next, let for n > e 










(12) a, = n log” n. 


Then (8) holds and from (9) and (10) we obtain easily for large n 





(13) b,, = 7 {1 — log "a,} <n — (1 — ean. 
k=1 ; 
Substituting into (7) one sees that, again for sufficiently large n, 


(14) 





Pr {S, —n+ (1 — 6a, < ea,} — 1, 






or, since M = 1, 






(15) Pr {S, — nM < —(1 — 2e)a,} — 1. 











This proves (I). 


A NOTE ON RANK, MULTICOLLINEARITY AND MULTIPLE 
REGRESSION’ 


By GERHARD TINTNER 


Towa State College 









Let X;.(¢ = 1, 2 --- M) be set of M random variables, each being observed at 
t=1,2---N. Xi = Mi+ yi. (This is essentially the situation envisaged 
by Frisch [1}). The systematic part of our variables M;, = EX;,. The yj; are 
normally distributed with means zero. Their variances and covariances are 
independent of ¢. The M;, and y;,; are independent of each other. Define 
X; = »,X;,/N the arithmetic mean of X;, and x;, = X;, — X; the deviation from 
the mean. Then a;; = Lie ;:/(N — 1) gives the variances and covariances 
of the observations. We want to determine the rank of the matrix of the 
variances and covariances of M;,;. 

Now assume that ||V;,|| is an estimate of the variance-covariance matrix of the 
error terms or “disturbances” y;,. The elements of this matrix are distributed 
according to the Wishart distribution and are independent of the M;,. They 
can be estimated as deviations from polynomial trends, as deviations from 
Fourier series, by the Variate Difference Method, ete. The estimates could also 
be based upon a priori knowledge if for instance the y;; are interpreted as errors 
of measurement. Assume that the estimate is based upon N’ observations. 


The author is much obliged to Professors W. G. Cochran (Iowa State College), H. 
Hotelling (Columbia University), T. Koopmans (University of Chicago) and A. Wald 
(Columbia University) for advice and criticism with this paper. He has also profited by 
reading the unpublished paper: ‘‘On the Validity of an Estimate from a Multiple Regression 
Equation” by F. V. Waugh and R. C. Been which deals in part with a problem related to 
the one presented here. 





















Journal Paper No. J-1323 of the Jowa Agricultural Experiment Station, Ames, Iowa. Project No. 730. 










A NOTE ON RANK 
Form the determinantal equation: 
(1) | aij — AVi;| = O. 


Apart from sampling fluctuations there should be 7 solutions \ = 1 of equation 
(1) if there are r independent linear relationships between the M;,. The rank 
of the variance-covariance matrix of M;,is then M — r. Following a suggestion 
of P. L. Hsu [2] made on the basis of the earlier work of R. A. Fisher [3] we form 
the test function 


(2) Ar = (N —1)(Q. +22--- +), 


where \; is the smallest root of (1), A: the next smallest, ete. Hence (2) is the 
sum of the r smallest roots of equation (1). The hypothesis to be tested is that 
there are exactly r independent linear relationships between the systematic 
parts of our variables in the population. This quantity (2) is distributed like 
x with r(N — M — 1 + r) degrees of freedom for large samples, i.e. if N’ be- 
comes large. It can be used for forming an opinion about the number of inde- 
pendent relationships existing among the systematic parts of our variables (1/;,). 

The importance of the question of the rank lies in the following: Sometimes 
we are not so much interested in making predictions as to estimate the “‘true”’ 
relationships which exist in the population which corresponds to our sample 
(Wald) [4]. Practically speaking, these relationships and their estimation are 
of great importance in economic statistics, as Haavelmo has shown [5]. But a 
knowledge of the rank i.e. the number of independent relationships existing be- 
tween the systematic parts of the variables may also be of some significance for 
the problem of prediction. The inclusion of strongly correlated predictors 
cuts down on the number of degrees of freedom without contributing significantly 
to the reduction of the variance. 

The remainder of this paper will be concerned with an attempt to estimate 
the relationships which in the population exist: between the systematic parts 
of the variables. This is an extension of the work of T. Koopmans [6] and the 
author [7] who dealt with the special case in which there is only one relationship 
hetween the systematic parts. 

Suppose that we decide that there are R independent relationships among the 
systematic parts of our variables 


(3) ko + Doky My =f =0; v =1,2,---,R,t =1,2,---,N. 


We desire to obtain estimates of these relationships. Our purpose here is not 
prediction but estimation of the structural coefficients k, ; . 

The method of maximum likelihood leads to the method of least squares if we 
treat the V;; as constants. This is again permissible if N’ is large and our esti- 
mates of the V;; become reasonably accurate. We have to minimize the follow- 
ing sum of squares 


(4) Q=2 2 





306 GERHARD TINTNER 


where 


(5) Q: = >> > VF (aie — mi) (te — my), 


where || V" |) = || Vi; ||", the inverse of the variance-covariance matrix of the 
errors. We also define m;, = M;, — M;, (t = 1. 2. ---.N) where M; is the 
mean of M;;. 

If there are F relationships (3) they can be written by using only R(M — R 
coefficients k, (7 = 1, 2 --- MW), if we disregard the constant terms io, , because 
we are now dealing with deviations from means. We can for instance express 
the first (W— FR) variables m;, in terms of the last R variables m;,. Hence, 
we have to impose 2° conditions upon the MR coefficients k, (7 = 1, 2, M 
appearing in (3). 

We impose R(R + 1)/2 conditions as follows 


? 


(6) Sit Koko iV i j saa Qouw = Sou ; 


where 6,, is « Kronecker delta. These conditions orthogonalize and normalize 
the coefficients k,;. We have now to adjust the Q, as given in (5) under the 
conditions (6) by determining appropriate m,,. This is a problem of re- 
stricted minima. 

We introduce a new function 


(7) F, = Q: —_ > Mottoe, 


where the u,, are Lagrange multipliers. Differentiating with respect to m;; and 
setting equal to zero we get the solution: 


(8) z V" (24 — mz) = > Mot Ke: 5 
d » 
or, solving for x34 — my: 
(9) Si — Mise = Zz 7. Mut V ej ho; 5 
Multiplying (9) by k,; and summing we get 
(10) bo = Liki Xj a 
Hence we have 
(11) Q), = = Mat = » (2% hoje) - 
c o 2 
Now we dispose of the remaining R(R — 1)/2 conditions 
(12) Do Met we = how = 0, v ~w. 
t 
We have to maximize Q under the R* conditions (6) and (12). This is done 


by finding the appropriate k,; . 
We form a new expression 





A NOTE ON RANK 


(13) G QQ + > a a — Ze Sw Aw Jew 


where the a,» and 8.. (v # w) are again Lagrange multipliers and ~,, = 0. 
Because of considerations of symmetry we have: a,, = Q@y» and By» = Buv 
Differentiating with respect to k,; and setting equal to zero we get the condition 
mt Lik Xje)Xit = xpitva Vi(Zj Kui Lje)Lit 
(14) 
= Zi Clow 2; } ip Kw; 


Multiplying by k,; 2nd summing we get 


(15) 
Multiplying by k.; (z = v) and summing we have 
(16) BreDe ber = Ave 


Both (15) and (16) tollow from conditions (6) and (12). 
Exchanging the role of + and z in (16) we have also 


(17) Leber = Ay (v #2)" 


Hence we have a,: = 3,; = 0,ifv # w. Inserting these results in (14) we get a 
system of linear and homogeneous equations in the unknown coefficients k,; . 
he determinant of the system must be equal to zero in order to yield non-trivial 
solutions. Trivial solutions are not admitted because of (6). Hence the a,, 
are simply the roots k of the equation | Syria; — kV; | = 0. 

Introducing 


(18) Ay = Ay/(N — 1), 


expression (14) becomes actually the determinantal equation (1). This expres- 
sion can be used to find the R smallest latent roots \, and the corresponding 
characteristic vectors k,; by Hotelling’s methods [8]. 

The constants of the equation (3) are finally determined by the condition 
that the optimum solutions have to go through the means of the variables 


(19) kvo + Dik; X; = 0. 


The distribution of the variances and covariances of the observations has recently 
been established by T. W. Anderson and M. A. Girshick for the cases R = 
M —iland R = M — 2 (9. 


REFERENCES 
(1{ R. Frisca: Statistical Confluence Analysis by Means of Complete Regression Systems, 
Oslo, 1934. 
(2) P. L. Hsu: “On the problem of rank and the limiting distribution of Fisher’s test func- 
tion,’’ Annals of Eugenics, Vol. 11 (1941), pp. 39, ff. 
3] R. A. Fisuer: “The statistical utilization of multiple measurements,”’ Annals of Eu- 
genics, Vol. 8 (1938), pp. 376 ff. 








308 WILLIAM G. MADOW 


[4] A. Wap: “The fitting of straight lines if both variables are subject to error,’? Annals 
of Math. Stat., Vol. 11 (1940), pp. 284 ff. 


[5] T. Haavewmo: “The probability approach in econometrics,’’ Econometrica, Vol. 12 
(1944), Supplement. 

[6] T. Koopmans: Linear Regression Analysis in Economic Time Series, Haarlem, 1937. 

[7] G. Trinrner: ‘‘An application of the variate difference method to multiple regression,”’ 
Econometrica, Vol. 12 (1944), pp. 97 ff. 


[8] H. Horexurne: “Simplified calculation of principal components,’’ Psychometrica, Vol. 1 
(1936), pp. 27 ff. 

[9] T. W. ANDERSON AND M. A. Grrsuicx: ‘‘Some extensions of the Wishart distribution,” 

Annals of Maih. Stat., Vol. 15 (1944), pp. 354 ff. 


NOTE ON THE DISTRIBUTION OF THE SERIAL 
CORRELATION COEFFICIENT’ 


By Wiiuiam G. Mapow 


Bureau of the Census 





The distribution of the serial correlation coefficient when p = 0 has been 
previously obtained.” The purpose of this note is to derive the distribution of 
the serial correlation coefficient, using the circular definition, when p ~ 0. 

Let us assume that the random variables x,,---, xy have a joint normal 
distribution® p(a, --- , zy | A, B, u) where 












198 P(r, Te , ty | A, B, 7) 
= log Ki — $[ A Do (es: — wy + 2B De — wien — uw) 


the term in the bracket is positive definite, K, is independent of the z; and if 
a+ L>N then xvi41 = Xi4r-n. It is then clear that 7, Vy, and ,Cy , where 
£ is the arithmetic mean, Vy = D(a: — #)’ and 


uCy = x (x; — £)(%i4n — 2) 


are sufficient statistics with respect to the estimation of u, A, and B. 
Let Vx .Ry = Cy define ,Ry, the serial correlation coefficient. Then if 





1 Presented at a meeting of the Cowles Commission for Economic Research in Chicago, 
January 31, 1945. 

2 See R. L. Anderson, ‘‘Distribution of the serial correlation coefficient’, pp. 1-13 and T. 
Koopmans, “‘Serial correlation and quadratic forms in normal variables’’, pp. 14-33, Annals 
of Math. Stat., Vol. XIII, No. 1, March, 1942. 






3’The expression p(t: , --- , &m|6:, °°: , 6,) means the probability density or the 
distribution of the random variables £ , --- , ém for the given values of the parameters 
6, , --- , 6. When used as an index of summation or multiplication, the letter 7 will 


assume all values from 1 through NV. 
























SERIAL CORRELATION COEFFICIENT 309 


A = 1, B = 0 Anderson has shown‘ that, if N is odd, the joint distribution of 
iRy and Vy is given by 


(1) D(Rw, Vx) = KVXX~ tN a Quy — Rx)" /en, for tea S Ru S Xn 
where 


2rk ic» 
Ry =iRhw, = cos W.% = II (A; — Aj), for all j #2 
. A 
and K~* = 2**-» 1[2(N — 3)]; while if N is even, the same formula holds except 


that 
§(N—2) 


a = I] (A; — Aj) VAs + 1), for all j ¥ 7. 


Fai 
We now extend Anderson’s distributions to the case where it is not assumed 
that A = 1 and B = 0. 
As a means of extending’ Anderson’s distribution let us recall that if a, ---, 
ty have a distribution p(x, --- , tw | 61, +++ , 6,) depending on several param- 


eters ,,--- ,0,, and if z,---, 2 are a sufficient set of statistics with respect 
to 01, °°: , 0, Le. 


p(x »°°* Bw | Oy, °° , Og) = h(a,-->, zl O1, °°: , O)m(m, --- , ty) 


where m(x1, -+* , Xv) is independent of 6, --- , 6,, then if the distribution of 
21, °**, Ze is found, assuming 6, ---, 0, have specific values 6;,---, 6, 
then it follows that 


h(a, oo+, Ze 61, +++, 0) 
h(er, +++, ZO, +++, 05)” 
We may call Anderson’s distribution given in (1), p(Ry , Vw | 1, 0), i.e. 


p(Rw ’ Vy | 1, 0) = D(Ry - Vy) 


p(ar, +, 2] 1, -** 5 Oy) = pla, e+, ze OL, eee 6°) 


Furthermore, < is distributed independently of Ry and Vy for all values of A 
and B and hence by a simple transformation,’ we can apply the above theorem. 


4 Anderson loc. cit. p. 3 and p. 5. Although the remainder of the note deals only with 
the case where L = 1 the procedure is general and may be easily carried through for other 
lags. 

5 See W. G. Madow Contributions to the ‘“‘Theory of multivariate statistical analysis’’, 
Trans. of the Amer. Math. Soc., Vol. 44, No. 3, November 1938, p. 461. 

6 For a proof that an orthogonal transformation of the variable 7; — u exists such that 
Vy and ,Cy are simultaneously reduced to canonical forms involving the same N — 1 of 
the variables of the transformation, and ./N (Z — u) is the Nth variable of the transforma- 
tion, see J. von Neumann, ‘‘Distribution of the ratio of the mean square successive differ- 
ence to the variance, Annals of Math. Stat., Vol. XII, No. 4, December 1941, pp. 368, 369. 
The proof there is given for Vy and 2(z; — 241)? but is easily extended to this case. 

1 


Then it is easy to show that N(Z — yw) is independently distributed of Vy , and ,Cy and 
has distribution log p[/N(z — ») |A, B] = log Kz — 3{A + 2B]N(& — u)2 where K, = (2x)—3 
(A + 2B)' and K{K, = Ky. 





3TO WILLIAM G. MADOW 


Then 
pP(ity , Vw | A, B) = p( Ry, Vw {| 1, OO 


where 


K; e MA Vnt2BRNVy) 


i ini Oa arta 
A (Qar)-18 ea }VN 


Hence it follows that, 


m 


p(Ry , Vw| A, B) = KKj(2x)*VX% Pe VN RN) SY OQ, — Ry) Jax, 
t=] 


for Amit < Ry < Am, Where the a; have different values according to whether N 


is odd or even. In order to evaluate p(Ry | A, B) we then need only integrate 
out Vy. Now 


[ Ve * go I BV, = THN — 1)KA/2 + BR). 
0 
Hence 


p(Rx | A, B) = KK;(2x)'* P[3(N — 1))(A/2 + BRx) **” DA, — Ry)! Jey. 
i=l 


r >! . 7 7. 
Che parameters K, , A and B depend on the different types of assumptions that 
may be made. In general 


> ( -3N ,1/2 
K, = (2r)*"A 
where A is a circulant (a, , --- , ay) such that 
a, = A, Qy+, = B, Qiiwepy = B, a; = 0 otherwise, 


and hence 


A = (4 + B cos =) = [J (4 + Ba). 


= 
N 
d 


i 
Then, one assumption is 


s_2 


A =-3, B = —p/o 
o 


where p is the “true” serial correlation coefficient. Other assumptions are 
possible.” However, these vary with the problem under consideration and may 
be left for further examination. 


7 One possible alternative definition is given by W. J. Dixon, ‘‘Further contributions to 
the problem of serial correlation”, Annals of Math. Stat., Vol. XV, No. 2, June 1944, p. 120, 
equation (2.1). 





NOTE ON A PAPER 311 


NOTE ON A PAPER BY C. W. COTTERMAN AND L. H. SNYDER 
By H. B. Mann'! 


Ohio State University 


C. W. Cotterman and L. H. Snyder [1] gave «a method to test simple Men- 
delian inheritance in randomly collected data. From a population assumed to 
be at equilibrium a sample is taken. The number of homozygous recessives in 
the sample is known. We wish to estimate the number of heterozygous individ- 
uals in the sample. 

Let a be the proportion of recessive genes among all genes in the population; 
7, p, 7 the proportion in the population of homozygous recessives, heterozygous 
und homozygous dominant individuals respectively and p, r, ¢ the sampling 
values of z, p, tr. Then 


(1) wr =a,p = 2ce(l — a), r= (l—a) ptrtt=l. 


Cotterman and Snyder use as an estimate of r the quantity 2\/p(1 — ~/p). 
It is the purpose of this note to show that this estimate is for all practical purposes 
equivalent to the maximum likelihood estimate of r. 

The joint distribution of p, r and ¢ in samples of n is given by 


nia"? p”” rt Z nia” [2a(1 an a)|""(1 a a)" 


P(o. 7.1) = A a, a A = I 
(2) (PTY) (np)! (nr)! (nt)! (np)! (nr)! (nt)! , 


where P(p, r, t) is the probability of obtaining the values p, r, ¢ in samples of n. 
We wish to maximize P(p, r, t) for fixed values of p with respect to @ and r. 
Maximizing first with respect to @ one easily obtains 


(3) 2a = 2p +r. 


We can regard @ as a continuous parameter and hence (3) must hold at any 
maximum of P(p, r,t). For any maximum of P(p, 7, 4) we must further have 


t Pp arti st—l 
¥ 


nia”? pr" | n! nr" p i . 
(np)! (nr)! (nt)! ~ (np)! (nr + 1)! (nt — 1)! 
and 


np nt np nr-—-l nt—l 
p T 


nia es nla : 
(np)! (nr)! (nt)! ~ (np)! (nr — 1)! (nt + 1)! 
This leads to the inequalities 


4 on Be re 
(4) nt~ nr+ 1’ nr ~ nt +1 


Substituting ¢ = 1 — p — r, 7 = 1 — w — p one easily obtains from (4) 


' Research under a grant of the research foundation of Ohio State University. 





312 H. B. MANN 


pn — pnp +p. > Mm — pp —t 
(5) n(l — mr) Po nl—-7) — 


1 ‘ 
The difference of the two bounds is oe Hence 7 must satisfy an equation 


pa eset) 66 Ke ki. ! 
n(1 — or) n 


Substituting the values for p, 7 and r from (1) and (3) we obtain 





a —-(1 —/2)—p+— =0, | 
De 
a= a 
n | 
Since 0 < € < 1 we obtain from (3) 
1 at, 1 a | 
- a 2r°2— —_ a a 


From (6) we see that for all practical purposes we may use the estimate 
r = 2y/p(1 — vp). | 


REFERENCE 


{1] C. W. CorrerMAN anp L.H. Snyper, “‘Tests of simple Mendelian inheritance in ran- 
domly collected data of one and two generations,’’ Jour. Am. Stat. Assn., 
Vol. 34 (1939), pp. 511-523. 








ran- 
8n., 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute new items of interest 
Personal Items 


Dr. R. G. D. Allen, who has been associated with the Combined Production 
and Resources in Washington has returned to the London School of Economics. 

Dr. Kenneth J. Arnold, who has been doing war research work with the 
Columbia University Statistical Research Group has returned to his position 
at the University of Wisconsin. 

Dr. Lee A. Aroian, on leave from Hunter College is serving as a research 
associate in the Applied Mathematics Panel Project at Berkeley, California 
under the direction of Professor Neyman. 

Dr. Ernest E. Blanche, has been appointed to the teaching staff of the Army 
University organized by the War Department for American veterans at Florence, 
Italy. 

Assistant Professor Z. W. Birnbaum of the University of Washington has 
been promoted to an associate professorship. 

Dr. Alva E. Brandt has returned from the Operational Research Section of 
the Ninth Air Force in Europe. 

Associate Professor R. 8S. Burington of the Case School of Applied Science 
has received the Meritorious Civilian Award from the United States Navy. 

Dr. Irving W. Burr has been promoted to an associate professorship at Pur- 
due University. 

Miss Frances Campbell, after receiving her doctorate at Michigan in June, 
has returned to her position at George Pepperdine College, Los Angeles. 

Professor Harry C. Carver, after a year of service with the Army Air Forces, 
has returned to the University of Michigan. 

Professor W. G. Cochran has returned to Iowa State College from a special 
mission to Germany. 

Professor Churchill Eisenhart, who has been doing war research work with 
the Columbia University Statistical Research Group, has returned to the 
University of Wisconsin. 

Miss Mary Elveback has been appointed to an assistant professorship at 
Rockford College. 

Assistant Professor C. H. Fischer of the University of Michigan has been 
promoted to an associate professorship. 

Mr. Elvin A. Hoy, who has spent three years with the War Production Board, 
is now Chief of the Statistics Section of the Bureau of Research and Statistics 
of the Social Security Board. 

Professor P. L. Hsu of Kunming, China, has been appointed to a visiting 
professorship of statistics at Columbia University, beginning January 1946. 

Dr. Doncaster G. Humm has received an honorary Doctor of Science degree 
at Bucknell University. 

313 








314 NEWS AND NOTICES 





Mr. Joseph M. Juran who has served during the war with the Foreign Eco- 
nomic Administration, is now Chairman of the Department of Administrative 
Engineering at New York University. 

Dr. Eugene Lukacs has been appointed Professor and Head of the Mathe- 
matics Department at Our Lady of Cincinnati College. 

Dr. R. v. Mises of Harvard University has been appointed to a professor- 
ship of aerodynamics and applied mathematics. 

Professor A. M. Mood has returned from Princeton University to his position 
at Iowa State College. 

Assistant Professor Henry Scheffé of Syracuse University has been granted 
leave of absence to serve as senior mathematician with Princeton University 
Station of Division 2 of NDRC. 


Symposium at the University of California 





A Symposium on Mathematical Statistics and Probability was held at the 
University of California at Berkeley on August 13-18, 1945. Those partici- 
pating in the symposium as speakers or chairmen were: 





Dean G. P. Adams, Prof. E. B. Babcock, Prof. E. M. Beesley, Prof. B. A. Bernstein, Prof. 
Kgon Brunswik, Prof. A. H. Copeland, Prof. P. H. Daus, Lt. Comm. F. W. Dresch, Prof. 
G. C. Evans, Miss Evelyn Fix, Prof. Harold Hotelling, Prof. Victor F. Lenzen, Prof. Jay L. 
Lush, Prof. J. H. McDonald, Prof. George F. McEwen, Prof. J. Neyman, Prof. G. Polya, 


Prof. Hans Reichenbach, Prof. A. C. Schaeffer, Prof. Morgan Ward, and Dr. Jacob Wolfo- 
witz. 


New Members 





The following persons have been elected to membership in the Institute : 

Abbey, Helen, M.A. (Michigan) Stat., Bur. of Records & Stat. Mich. Dept. of Health, 916 
N. Chestnut, Lansing, Michigan. 

Acton, Forman, Ch. E. (Princeton) 17'/4 Army of the U.S., SED Barracks Area, Oak Ridge, 
Tenn. 

Aitchison, Beatrice, Ph.D. (Johns Hopkins) Econ. & Stat. Analy., I, CC. 1929 S. St., 
N.W. Wash., 9, D. C. 

Auner, George, A.B. (Western Reserve) Stat. Ohio High. Plan. Sur., 576 So. 18th St. * 192 
Arlington, Va. 

Bartlett, Maurice, D.Sc. (London) Univ. Lecturer, Cambridge, 137 Chesterton Road, Cam- 
bridge, Eng. 

Berwick, Leo, A.B. (New York Univ.) Capt., A. C. Asst. to Surgeon Stat. Unit of Psych. 
Sect. Hq. AFTRC, T & P Bldg., Fortworth 2, Texas. 

Blackwell, Asst. Prof. David, Ph.D. (Illinois) Math. Dept. Howard Univ. Wash., D.C. 

Borland, James, M.A. (Indiana) Capt., Ex. Officer, Inspect. Office, Pine Bluff Arsenal, 
Ark. 

Brown, Prof. Theo., Ph.D. (Yale) Bus. Stat. Harvard Bus. School, Soldier’s Field, Boston 
63, Mass. 

Bunke, Alfred, M.A. (Columbia) Sen. Stat. NV. Y. State Dept. of Labor, 3? Parkwood St. 
Albany 3, N.Y. 

Burington, Asso. Prof. Richard, Ph.D. (Ohio) On leave from Case School of Applied Sci- 
ence, Cleveland, Ohio, at Present, Head Math., Bu. Ord. USN 5200 N. Carlin Spring 

Rd., Arlington, Va. 








































Keo- 
ative 


athe- 
SSOr- 
ition 


nted 
rsity 


NEWS AND NOTICES 315 


Campbell, James Ph.I). (Edinburgh) Univ. Math. Lecturer, Victoria Univ. Coll. Well, 
W.I. New Zeal. 

Churchill, Edmund, A. M. (Columbia) 1585 Union Port Road, New York 2, N.Y. 

Cornfield, Jerome, B.S. (New York Univ.) Stat. Dept. of Labor, R.F.D. #*2 Herndon, 
Va. 

Cruden, Dorothy, A.B. (California) Stat.in Sampling Sect. Spec. Sur. Div. Bur. of Census 
% Pop. Div. Wash., D.C. 

Daniel, Cuthbert, M.S. (Mass. Inst. Tech.) Stat. kng., Carbide and Carbon Chem. Corp., 
460 East Drive, Oak Ridge, Tenn. 

David, Florence, Ph.D. (London) Univ. Sect. Stat. Dept. Univ. Coll., London, W.C. 1, 
England. 

De Garis, Prof. Charles, Ph.D. (Johns Hopkins) U’niv. of Okla. School of Med., Okla. City, 
4. Okla. 

Echegaray, Miguel, C.K. Ag. Attache to the Spanish Embassy, 2700 15th St. N.W. Wash., 
D.©. 

Ede, Richard, B.S. (Wisconsin) Chemistry Devel. Metallurgist, Gary Works, Car. Steel 
Ill. 547 Fillmore St., Gary, Indiana. 

Ewart, Robert, A.B. (New York Univ.) Research Physicist, Ballistics Dept. Des Moines 
Ord. Plant 683-46th St. Des Moines 12, Iowa. 

Federer, Walter, M.S. (Kansas State) Research-Ag. Stat. Stat. Lab., Iowa State Coll. 
Ames, Iowa. 

Freeman, Richard, B.Sc. (McMaster) Research Chemist. 1 Maple Ave., Hamilton, Ontario, 
Canada. 

Goldrosen, David, B.S. (Worcester Poly Inst.) Lt. USNR Quality Control Officer, Insp. 
of Naval Mat’l. 204 Ward St. Newton Centre, Mass. 

Goodman, Albert, Supervisor Stat. Control, Quality Control, Westinghouse Elec. Corp., 
Essington, Pa. 

Grant, Asst. Prof. David, Ph.D. (Stanford) Dept. of Psych., Univ. of Wis., Madison 6, 
Wisconsin. 

Greenhouse, Samuel, B.S. (City Coll. N.Y.) 7/4 U.S. Army, 5815-13th St. N.W. Wash.., 
11, D. C. 

Gretton, Owen, A.B. (Brown) Acting Chief, Ind. Div. Sen. Econ., 10157 Old Bladensburg 
Road, Silver Spring, Maryland. 

Hayden, Byron, A.B. (Geo. Wash. Univ.) Econ. Stat. A. A. F. Wash. D. C. 1301 S. Cleve- 
land St., Arlington, Va. 

Hecht, Bernard, B.E.C. (City Coll. of N. Y.) T/sgt, 516 Corp., Army-Navy Electronics 
Stand. Agency 42 Washington Village, Asbury Park, N. J. 

Haufek, Lyman, M.B.A. (Northwestern) U.S. Army Hq. ASF, Chief Supply Stat. Unit, 
1121 New Hampshire Ave., N.W., Wash. 7, D.C. 

Kampschaefer, Margaret, A.B. (Indiana) Stat. Bur. of Labor Stat. 1037 E. Blackford 
Ave., Evansville, 13, Indiana. 

Kozakiewica, Waclaw, Ph.D. (Warsaw) Inst. in Math., Univ. of Saskatoon, Saskatoon, 
Canada. 

Laguardia, Prof. Rafael, Director of Math & Stat. (Univ. of Uruguay) Fine Hall, Prince- 
ton Univ., Princeton, N. J. 

Leighton, Walter, Ph.D. (Harvard) On leave at Northwestern as Director, Applied Math. 
Group (NORC) Lecturer in Math. The Rice Inst. 1704 Judson Ave., Evanston, Illinois. 

Lieblein, Julius, M.A. (Brooklyn Coll.) Econ. Anal. Room 4013, U. 8. Trea. Dept., 15th 
& Penna. N.W. Wash. 25, D.C. 

Lien, Roy, M.S. (Oregon State) Rate Stat., Northwestern Elect. Co., Portland, Oregon, 
3121 S.E. Division St., Portland 2, Oregon. 

Lonseth, Asst. Prof. Arvid, Ph.D. (California) Math. Dept. Northwestern University, 
Evan., Iil. 








316 NEWS AND NOTICES 






Miohalup, Eric, Ph.D. (Univ. of Vienna) 
Caracas, Venezuela. 

Monro, Sutton B.S. (Mass. Inst. Tech.) Head of Str. Staff Unit. Amm. Div. Naval Ord. 
Lab. Lt. USNR 3433 Martha Custis Dr. Alexandria, Va. 

Nilson, Hugo, Ph.D. (Minnesota) Chemist in Charge Fishery Tech. Lab. U.S. Fish & 
Wildlife Serv. College Park, Maryland. 

Nichols, Russell, B.A. (DePauw) Sergeant, U. S. Army Co. A. 586 A 1, Kn APO 656, 
NYC (838-745-907). 

O’Neil, Frank (Lowell Textile Inst.) Senior Textile Technician, Worsted Division, Pacific 
Mills, Lawrence, Mass. 

Rappaport, Gladys, B.A. (Hunter) Jr. Stat. Stat. Researeh Group, Columbia, Univ., 
2120 Tiebout Ave., Bronx 57, New York. . 

Rice, Assoc. Prof. Nelson, Ph.D. (C. U. of A.) 3326 13th St. N.E., Wash., 17, D.C. 

Schell, Emil, M.A. (Western Reserve) Stat. Employment Stat. Div. 3440 ‘NV. 12 kd. 
Arlington, Va. 

Schneberger, Richard, (Cert. to teach in Tech High School Training for Industry State 
Programs) Edison Gen. Elec. Appl. Co., 5600 W. Taylor St., Chgo., Ill. 

Simon, Geo., Ed. M. (Harvard) Capt., A. C. Avia. Psych. Psych. Section, Surgeon, Hq. 
AFTRC, Ft. Worth 2, Tezas. 

Spaulding, Asa, M.A. (Michigan) Actuary & Asst.Sec. No. Carolina Mut. Life. Ins. Co. 
Durham, North Carolina. 

Spoerl, Charles, B.A. (Harvard) Asst. Treas. %Aetna Life Ins. Co. Hartford, Conn. 

Springer, Wm., C.E. (Columbia) Asst. Vice Pres. in charge of Research, Bristol-Myers Co. 
Hillside 5, New Jersey. 

Stock, J. Stevens, M.A. (American) Lt. USNR, Hd. Stat. Sect. Div. of Shore Est. & 
Civilian Per. Navy Dept., 8508 Garfield St., Bethesda, Maryland. 

Stott, Alex, A.B. (Harvard) Lt. Comdr. USNR, 2800 Devonshire Pl., N.W., Wash. 8, D.C. 

Taylor, Thomas, Ph.D. (Yale) Research Engineer, U.S. Testing Co. 45 Grover Lane, Cald- 
well, N. J. 

Treanor, Glen, B.A. (Minnesota) Principal Tax Economist, Bus. & Ind. Research Div., 
Income Tax Unit, Bur. of Int. Rev., Room 2232, Wash., D.C. 

Wherry, Robert, Ph.D. (Ohio State) On leave Dept. Psych. Univ. of N. C., Civilian Head, 
Stat. Anal. Unit AGO Personnel Research Section, 270 Madison Ave., N. Y. 

Wilcoxon, Frank, Ph.D. (Cornell) Group Leader, Insecticide & Fungicide, La., Amer. 
Cyanamid Co., Stamford, Conn. R.D. #1 Box 39a, Riverside, Conn. 

Wolff, Marion, A.B. (Hunter) Asst. Math. Stat. Stat. Research Group Div. of War 

Research Columbia University 1724 Crotona Park East, New York 60, N.Y. 





Math & Astronomy Actuary, Apartado 848, 































Unknown Addresses 










Recent mail has not been delivered to the following members of the Institute 
at the addresses listed. If anyone knows of the current address of one or more 
of these members, please notify the Secretary-Treasurer at once. 


Lt. (jg) Gordon L. Beckstead—Aer. Navy 151 % Fleet Postmaster, San Fran., Cal. 
Dr. Charles Wm. Cotterman—637 Hawthorne Road, Winston Salem, North Carolina 
Mr. James Davidson—Box 344, Christiansburg, Virginia 

S/sgt George Elmstrom—Det. of Pat., Hospital Plant. 4176 APO % PM, NYC, N. Y. 
Mr. Henry Goldberg—401 W. 118th St. New York 27, New York 

Mr. Henry Hebley—Box 166, Pittsburgh 30, Pennsylvania 

Mr. John Mandel—45 Kew Gardens Road, Kew Gardens, New York 

Mr. David F. Votaw, Jr., USNTC—Bainbridge, Maryland 

Mr. Edward F. Wilson—Keswick Colony, Keswick Grove, New Jersey 












REPORT ON THE RUTGERS MEETING OF THE INSTITUTE 


The Eighth Summer Meeting of the Institute of Mathematical Statistics 
was held at the New Jersey College for Women, Rutgers University, New Bruns- 
wick, New Jersey on Sunday, September 16, 1945, where the Summer Meeting 
of the American Mathematical Society was also being held. The following 
115 members of the Institute attended the meeting: 


C. B. Allendoerfer, R. L. Anderson, T. W. Anderson, H. E. Arnold, I. L. Battin, Archie 
Blake, C. I. Bliss, P. Boschan, A. H. Bowker, A. E. Brandt, G. W. Brown, R. H. Brown, T. 
H. Brown, T. A. Budne, R. S. Burington, B. H. Camp, A. G. Carlton, P. C. Clifford, E. P. 
Coleman, T. F. Cope, G. M. Cox, H. B. Curry, J. H. Curtiss, J. F. Daly, J. H. Davidson, B. 
B. Day, W. E. Deming, H. F. Dodge, Jacques Dutka, P. S. Dwyer, Churchill Eisenhart, 
Wade Ellis, Mary Elveback, Benjamin Epstein, C. D. Ferris, C. H. Fischer, M. M. Flood, 
R. M. Foster, Milton Friedman, J. P. Gill, M. A. Girshick, Casper Goffman, A. A. Goodman, 
Dorothy K. Gottfried, T. N. E. Greville, F. E. Grubbs, K. W. Halbert, Marshall Hall, P. 
R. Halmos, Miriam 8. Harold, Millard Hastay, Bernard Hecht, William Hodgkinson, I. 8. 
Hoffer, Harold Hotelling, A. S. Householder, W. Hurwicz, Irving Kaplansky, C. J. Kirchen, 
Jack Laderman, Rafael Laguardia, H. G. Landau, Howard Levene, Harriet Levine, S. B. 
Littauer, A. T. Lonseth, P. J. McCarthy, W. G. Madow, J. W. Mauchly, E. B. Mode, D. J. 
Morrow, J. E. Morton, Judith Moss, P. M. Neurath, M. L. Norden, H. W. Norton, C. O. 
Oakley, P. S. Olmstead, Edward Paulson, John Riodan, H. E. Robbins, H. G. Romig, William 
Salkind, M. M. Sandomire, Arthur Sard, F. E. Satterthwaite, L. J. Savage, Henry Scheffé, 
Bernice Scherl, Edward Schrock, I. E. Segal, C. E. Shannon, L. W. Shaw, Herbert Solomon, 
Mortimer Spiegelman, J. R. Steen, Arthur Stein, F. F. Stephan, A. P. Stergion, L. V. 
Toralballa, Mary N. Torrey, A. W. Tucker, L. R. Tucker, J. W. Tukey, Helen M. Walker, 
W. A. Wallis, R. M. Walter, B. T. Weber, Joseph Weinstein, A. E. R. Westman, Frank Wil- 
eoxon, S. S. Wilks, Jacob Wolfowitz, C. P. Winsor, Ruth Zwerling. 


The first session, on Sunday morning, was devoted to a symposium on Se- 
quential Analysis. Professor W. Allen Wallis, of Stanford University and Colum- 
bia Statistical Research Group, acted as chairman for this session. The fol- 
lowing invited addresses were given. 


1. Theory of Sequential Analysis. 
Professor A. Wald, Columbia University and Columbia Statistical Research Group. 

2. Construction of Multiple Sampling Inspection Plans for Attributes from Sequential 
Principles. 
Mr. Milton Friedman, National Bureau of Economic Research and Columbia 
Statistical Research Group. 

3. Applications of Sequential Analysis to the Ranking of Two Populations with Respect to 
a Single Parameter. 
Mr. M. A. Girshick, Bureau of Agricultural Economics and Columbia Statistical Re- 
search Group. 


The morning session was concluded after lively discussion on the symposium 
topic. 

Dr. W. Edwards Deming, of the Bureau of the Budget and President of the 
Institute, presided at the afternoon session. The following papers were pre- 
sented: 


317 





REPORT ON THE RUTGERS MEETING 


. On The Variance of a Random Set in n Dimensions. 
Dr. Herbert E. Robbins, Post Graduate School, Annapolis. 
2. The Non-Central Wishart Distribution and its Application to Problems In Multivariate 
Statistics. 
Dr. T. W. Anderson, Jr., Princeton University. 
3. The Effect on a Distribution Function of Small Changes in the Population Function. 
Professor Burton H. Camp, Wesleyan University. 
. On Composite Distributions. 
Dr. Casper Goffman and Dr. Benjamin Epstein, Westinghouse Electric Corp. 
5. Population, Expected Values and Sample. 
Professor Emil J. Gumbel, New School for Social Research. 
3}. On the Selection of a Sample in Repeated Steps. 
Dr. W. G. Madow, Bureau of the Census. 
. On Optimum Estimates for Stratified Samples. 
Mr. Morris H. Hansen and Mr. William N. Hurwitz, Bureau of the Census. Presented } 
by Margaret Gurney. 
. Pearsonian Correlation Coefficients Associated With Least Squares Theory (Presented 
by Title). 
Professor P. S. Dwyer, University of Michigan. 


The afternoon session concluded with the report of the Committee on the 
Teaching of Statistics which was presented by Professor Harold Hotelling of 
Columbia University. 


P. S. Dwykr, 
Secretary 








